SAM-DiffSR: Structure-Modulated Diffusion Model for Image Super-Resolution

📅 2024-02-27

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

195K/year

🤖 AI Summary

In real-world image super-resolution, diffusion models suffer from insufficient structural detail recovery due to reliance on a single, fixed forward noise distribution. To address this, we propose a zero-overhead structure-aware diffusion super-resolution method. Our core innovation is the first integration of fine-grained semantic structural priors—extracted from the Segment Anything Model (SAM)—as implicit guidance to dynamically modulate the mean of the forward diffusion noise in a region-adaptive manner. During training, structural cues explicitly steer denoising; crucially, at inference time, the model operates entirely without SAM, incurring no additional computational cost. Evaluated on DIV2K, our method achieves a +0.74 dB PSNR gain over the strongest baseline, significantly suppresses artifacts (e.g., blurring and checkerboard patterns), and outperforms all existing diffusion-based super-resolution approaches in both fidelity and perceptual quality.

Technology Category

Application Category

📝 Abstract

Diffusion-based super-resolution (SR) models have recently garnered significant attention due to their potent restoration capabilities. But conventional diffusion models perform noise sampling from a single distribution, constraining their ability to handle real-world scenes and complex textures across semantic regions. With the success of segment anything model (SAM), generating sufficiently fine-grained region masks can enhance the detail recovery of diffusion-based SR model. However, directly integrating SAM into SR models will result in much higher computational cost. In this paper, we propose the SAM-DiffSR model, which can utilize the fine-grained structure information from SAM in the process of sampling noise to improve the image quality without additional computational cost during inference. In the process of training, we encode structural position information into the segmentation mask from SAM. Then the encoded mask is integrated into the forward diffusion process by modulating it to the sampled noise. This adjustment allows us to independently adapt the noise mean within each corresponding segmentation area. The diffusion model is trained to estimate this modulated noise. Crucially, our proposed framework does NOT change the reverse diffusion process and does NOT require SAM at inference. Experimental results demonstrate the effectiveness of our proposed method, showcasing superior performance in suppressing artifacts, and surpassing existing diffusion-based methods by 0.74 dB at the maximum in terms of PSNR on DIV2K dataset. The code and dataset are available at https://github.com/lose4578/SAM-DiffSR.

Problem

Research questions and friction points this paper is trying to address.

Enhance image super-resolution using SAM

Reduce computational cost in diffusion models

Improve detail recovery in complex textures

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes SAM for structure modulation

Encodes structural position in masks

Modulates noise without extra cost

🔎 Similar Papers

No similar papers found.