🤖 AI Summary
Generative adversarial networks (GANs) and diffusion models struggle to jointly achieve high inference efficiency and reconstruction fidelity in high-resolution image super-resolution. Method: This paper proposes a GAN-diffusion hybrid architecture that jointly models the generative process in latent space, incorporates an adaptive noise corruption mechanism to mitigate discriminator overfitting, and—uniquely—reduces the number of diffusion steps to single digits while preserving perceptual quality. Contribution/Results: The method outperforms SR3 and I²SB across PSNR, SSIM, and LPIPS metrics on standard benchmarks. It achieves near real-time inference speed without compromising perceptual quality, thereby bridging, for the first time, the long-standing speed–quality trade-off gap between diffusion models and GANs.
📝 Abstract
In this work, we present SupResDiffGAN, a novel hybrid architecture that combines the strengths of Generative Adversarial Networks (GANs) and diffusion models for super-resolution tasks. By leveraging latent space representations and reducing the number of diffusion steps, SupResDiffGAN achieves significantly faster inference times than other diffusion-based super-resolution models while maintaining competitive perceptual quality. To prevent discriminator overfitting, we propose adaptive noise corruption, ensuring a stable balance between the generator and the discriminator during training. Extensive experiments on benchmark datasets show that our approach outperforms traditional diffusion models such as SR3 and I$^2$SB in efficiency and image quality. This work bridges the performance gap between diffusion- and GAN-based methods, laying the foundation for real-time applications of diffusion models in high-resolution image generation.