🤖 AI Summary
Existing diffusion models suffer significant performance degradation in high-scale super-resolution tasks (e.g., ×8 upscaling to 2048²) due to operating beyond their native resolution support, while directly training high-resolution models incurs prohibitive computational costs. To address this, this work proposes TUDSR, a framework built upon SD2.1-base that integrates diffusion models with a single-step GAN architecture and introduces a novel two-stage upsampling mechanism: initial training at low resolution (R), followed by fine-tuning at high resolution (NR) via a cyclic tiling strategy. This approach effectively overcomes the limitations of resolution and scale without substantially increasing computational or memory overhead. Experimental results demonstrate that TUDSR-S achieves state-of-the-art performance at both 1024² and 2048² resolutions, significantly outperforming existing methods.
📝 Abstract
Diffusion-based generative models have achieved remarkable success in real-world image super-resolution (SR). With tiled diffusion techniques, these models can produce high-resolution images that exceed their native-supported resolution. However, the quality of such high-resolution (e.g $2048^2$) outputs often remains extremely poor, primarily due to two factors we consider: the image upsampling ratio (e.g $\times8$) exceeding the model's native-supported upsampling ratio (e.g $\times4$), and the model's native-supported resolution. In practice, training a native high-resolution model requires larger architectures, which incur significant computational overhead and GPU memory costs, making it hard on limited-resource equipment. Thus, we present TUDSR, a Twice Upsampling-Diffusion framework for higher SR. The TUDSR framework mainly consists of two stages: the first involves training at $R$-resolution, and the second introduces a looped chunk-based training strategy at $NR$-resolution. Each stage adapts a one-step GAN architecture comprising a generator and a discriminator. Based on SD2.1-base, we develop TUDSR-S, which achieves state-of-the-art performance across multiple benchmarks. Extensive experiments further demonstrate that TUDSR-S generates high-quality images at the resolutions of $1024^2$ and even $2048^2$, significantly outperforming existing approaches. Code is available at https://github.com/wuer5/TUDSR.