Zero-shot Adaptation of Stable Diffusion via Plug-in Hierarchical Degradation Representation for Real-World Super-Resolution

📅 2025-12-11

📈 Citations: 0

✨ Influential: 0

career value

156K/year

🤖 AI Summary

Real-world image super-resolution (Real-ISR) faces challenges in modeling unknown, coupled, and diverse degradations. Existing methods rely on predefined degradation levels and semantically ambiguous CLIP text embeddings, limiting generalization. To address this, we propose Hierarchical Degradation CLIP (HD-CLIP), the first framework to decouple semantic content from interpolatable ordinal degradation representations. We further introduce Classifier-Free Projection Guidance (CFPG), enabling degradation-aware zero-shot diffusion generation control without classifier-based conditioning. Our method requires no fine-tuning and is plug-and-play compatible with diffusion backbones such as Stable Diffusion, supporting dual-path guidance—semantic and degradation-aware. Evaluated on multi-source real-world degradation datasets, our approach significantly improves detail fidelity and perceptual realism while effectively suppressing hallucinations and artifacts. It demonstrates strong zero-shot transferability and seamless integration with existing SR frameworks.

Technology Category

Application Category

📝 Abstract

Real-World Image Super-Resolution (Real-ISR) aims to recover high-quality images from low-quality inputs degraded by unknown and complex real-world factors. Real-world scenarios involve diverse and coupled degradations, making it necessary to provide diffusion models with richer and more informative guidance. However, existing methods often assume known degradation severity and rely on CLIP text encoders that cannot capture numerical severity, limiting their generalization ability. To address this, we propose extbf{HD-CLIP} ( extbf{H}ierarchical extbf{D}egradation CLIP), which decomposes a low-quality image into a semantic embedding and an ordinal degradation embedding that captures ordered relationships and allows interpolation across unseen levels. Furthermore, we integrated it into diffusion models via classifier-free guidance (CFG) and proposed classifier-free projection guidance (CFPG). HD-CLIP leverages semantic cues to guide generative restoration while using degradation cues to suppress undesired hallucinations and artifacts. As a extbf{plug-and-play module}, HD-CLIP can be seamlessly integrated into various super-resolution frameworks without training, significantly improving detail fidelity and perceptual realism across diverse real-world datasets.

Problem

Research questions and friction points this paper is trying to address.

Enhances real-world image super-resolution with unknown degradations

Introduces hierarchical degradation embedding for better generalization

Provides plug-and-play module to reduce artifacts without retraining

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical degradation embedding captures ordered relationships

Classifier-free projection guidance integrates degradation cues

Plug-and-play module enhances super-resolution without training

🔎 Similar Papers

TDDSR: Single-Step Diffusion with Two Discriminators for Super Resolution