🤖 AI Summary
Existing super-resolution (SR) methods neglect the human visual system’s (HVS) perceptual selectivity—namely, its sensitivity to luminance, contrast, spatial frequency, motion, and viewing conditions (e.g., illumination, viewing distance)—leading to computationally redundant operations. To address this, we propose the first architecture-agnostic, perception-driven dynamic optimization framework for SR, which explicitly models HVS physiological constraints as control signals to jointly optimize perceptual quality and computational efficiency. Our method employs physiology-informed dynamic weight scheduling, conditional branch selection, and complexity-adaptive pruning, achieving lossless subjective quality while reducing FLOPs by over 2×. It supports real-time multi-scale and multi-frame video SR. Extensive user studies confirm its superior perceptual fidelity. This work establishes a new paradigm for efficient vision computing grounded in human perception principles.
📝 Abstract
Modern deep-learning based super-resolution techniques process images and videos independently of the underlying content and viewing conditions. However, the sensitivity of the human visual system to image details changes depending on the underlying content characteristics, such as spatial frequency, luminance, color, contrast, or motion. This observation hints that computational resources spent on up-sampling visual content may be wasted whenever a viewer cannot resolve the results. Motivated by this observation, we propose a perceptually inspired and architecture-agnostic approach for controlling the visual quality and efficiency of super-resolution techniques. The core is a perceptual model that dynamically guides super-resolution methods according to the human's sensitivity to image details. Our technique leverages the limitations of the human visual system to improve the efficiency of super-resolution techniques by focusing computational resources on perceptually important regions; judged on the basis of factors such as adapting luminance, contrast, spatial frequency, motion, and viewing conditions. We demonstrate the application of our proposed model in combination with network branching, and network complexity reduction to improve the computational efficiency of super-resolution methods without visible quality loss. Quantitative and qualitative evaluations, including user studies, demonstrate the effectiveness of our approach in reducing FLOPS by factors of 2$mathbf{x}$ and greater, without sacrificing perceived quality.