🤖 AI Summary
To address the longstanding trade-off between perceptual quality and fidelity in image restoration, this paper proposes the first conditional control framework integrating the Mamba state-space model with diffusion models. Methodologically, we innovatively incorporate the Mamba architecture into the conditioning network of diffusion models—replacing conventional CNNs and attention mechanisms—and empirically demonstrate that direct noise prediction yields superior perceptual quality while better preserving structural details. Extensive experiments on benchmarks including Rain100H/L, GoPro, and SSID show that our approach achieves significantly lower LPIPS and FID scores than state-of-the-art methods, while maintaining competitive PSNR and SSIM values. This work is the first to validate the effectiveness of Mamba-driven diffusion models for fine-grained generative control and joint optimization of perceptual quality and fidelity.
📝 Abstract
This paper proposes ControlMambaIR, a novel image restoration method designed to address perceptual challenges in image deraining, deblurring, and denoising tasks. By integrating the Mamba network architecture with the diffusion model, the condition network achieves refined conditional control, thereby enhancing the control and optimization of the image generation process. To evaluate the robustness and generalization capability of our method across various image degradation conditions, extensive experiments were conducted on several benchmark datasets, including Rain100H, Rain100L, GoPro, and SSID. The results demonstrate that our proposed approach consistently surpasses existing methods in perceptual quality metrics, such as LPIPS and FID, while maintaining comparable performance in image distortion metrics, including PSNR and SSIM, highlighting its effectiveness and adaptability. Notably, ablation experiments reveal that directly noise prediction in the diffusion process achieves better performance, effectively balancing noise suppression and detail preservation. Furthermore, the findings indicate that the Mamba architecture is particularly well-suited as a conditional control network for diffusion models, outperforming both CNN- and Attention-based approaches in this context. Overall, these results highlight the flexibility and effectiveness of ControlMambaIR in addressing a range of image restoration perceptual challenges.