🤖 AI Summary
Video super-resolution (VSR) suffers from a lack of theoretical foundations for method selection, poor interpretability, and weak task adaptability. To address these issues, this work introduces the first comprehensive, multi-level taxonomy for deep learning–based VSR, systematically categorizing core methodologies—including optical flow estimation, recursive and Transformer-based architectures, spatiotemporal alignment, motion compensation, and multi-frame fusion—and elucidating their architectural evolution, performance gains, and design trade-offs. By synthesizing commonalities and distinctions across state-of-the-art models, we explicitly characterize how each module constrains accuracy, computational efficiency, and robustness. This taxonomy significantly enhances model interpretability and enables task-driven, customizable VSR modeling. It provides both a theoretical framework and practical guidelines for developing efficient, application-specific VSR systems—e.g., in medical imaging or satellite video analysis.
📝 Abstract
Video super-resolution (VSR) is a prominent research topic in low-level computer vision, where deep learning technologies have played a significant role. The rapid progress in deep learning and its applications in VSR has led to a proliferation of tools and techniques in the literature. However, the usage of these methods is often not adequately explained, and decisions are primarily driven by quantitative improvements. Given the significance of VSR's potential influence across multiple domains, it is imperative to conduct a comprehensive analysis of the elements and deep learning methodologies employed in VSR research. This methodical analysis will facilitate the informed development of models tailored to specific application needs. In this paper, we present an overarching overview of deep learning-based video super-resolution models, investigating each component and discussing its implications. Furthermore, we provide a synopsis of key components and technologies employed by state-of-the-art and earlier VSR models. By elucidating the underlying methodologies and categorising them systematically, we identified trends, requirements, and challenges in the domain. As a first-of-its-kind survey of deep learning-based VSR models, this work also establishes a multi-level taxonomy to guide current and future VSR research, enhancing the maturation and interpretation of VSR practices for various practical applications.