🤖 AI Summary
Weak cross-domain generalization—particularly severe performance degradation under extreme scaling factors (e.g., ×8/×16)—remains a critical challenge in single-image super-resolution (SISR). To address this, we propose the first few-shot adaptive SISR framework integrating CLIP-guided semantic priors with meta-learning. Our method introduces a novel CLIP-guided feature alignment mechanism and a multi-stage domain adaptation module, pioneering the deep embedding of vision-language priors into the reconstruction pipeline. A semantic consistency loss is further designed to enforce high-level semantic plausibility of reconstructed outputs. Requiring only a few target-domain samples for rapid adaptation, our approach achieves PSNR gains of +0.15 dB and +0.30 dB on Urban100 at ×8 and ×16 scaling, respectively—outperforming state-of-the-art domain-generalization SISR methods. Notably, it demonstrates superior robustness in cross-domain and large-scale-factor scenarios.
📝 Abstract
This work introduces CLIP-aware Domain-Adaptive Super-Resolution (CDASR), a novel framework that addresses the critical challenge of domain generalization in single image super-resolution. By leveraging the semantic capabilities of CLIP (Contrastive Language-Image Pre-training), CDASR achieves unprecedented performance across diverse domains and extreme scaling factors. The proposed method integrates CLIP-guided feature alignment mechanism with a meta-learning inspired few-shot adaptation strategy, enabling efficient knowledge transfer and rapid adaptation to target domains. A custom domain-adaptive module processes CLIP features alongside super-resolution features through a multi-stage transformation process, including CLIP feature processing, spatial feature generation, and feature fusion. This intricate process ensures effective incorporation of semantic information into the super-resolution pipeline. Additionally, CDASR employs a multi-component loss function that combines pixel-wise reconstruction, perceptual similarity, and semantic consistency. Extensive experiments on benchmark datasets demonstrate CDASR's superiority, particularly in challenging scenarios. On the Urban100 dataset at $ imes$8 scaling, CDASR achieves a significant PSNR gain of 0.15dB over existing methods, with even larger improvements of up to 0.30dB observed at $ imes$16 scaling.