🤖 AI Summary
To address the high redundancy and excessive resource consumption in image super-resolution (ISR) training data, this paper proposes the first instance-level data distillation framework tailored for ISR. Methodologically, it introduces randomized local Fourier feature extraction, jointly optimized with multi-level feature distribution alignment and gradient matching to preserve both global structural consistency and local texture fidelity. Using only 10% of the original DIV2K samples, the framework synthesizes a high-quality compact training set. When training mainstream models—including EDSR and RCAN—this distilled dataset achieves PSNR/SSIM scores comparable to or exceeding those obtained with the full dataset, while enabling more stable convergence. This work pioneers the systematic application of instance-level data distillation to ISR, significantly improving data efficiency and establishing a new paradigm for lightweight, privacy-preserving super-resolution training.
📝 Abstract
Deep learning based image Super-Resolution (ISR) relies on large training datasets to optimize model generalization; this requires substantial computational and storage resources during training. While dataset condensation has shown potential in improving data efficiency and privacy for high-level computer vision tasks, it has not yet been fully exploited for ISR. In this paper, we propose a novel Instance Data Condensation (IDC) framework specifically for ISR, which achieves instance-level data condensation through Random Local Fourier Feature Extraction and Multi-level Feature Distribution Matching. This aims to optimize feature distributions at both global and local levels and obtain high-quality synthesized training content with fine detail. This framework has been utilized to condense the most commonly used training dataset for ISR, DIV2K, with a 10% condensation rate. The resulting synthetic dataset offers comparable or (in certain cases) even better performance compared to the original full dataset and excellent training stability when used to train various popular ISR models. To the best of our knowledge, this is the first time that a condensed/synthetic dataset (with a 10% data volume) has demonstrated such performance. The source code and the synthetic dataset have been made available at https://github.com/.