🤖 AI Summary
Diffusion models suffer from slow sampling and poor early-stage reconstruction quality—particularly in lightweight architectures—due to trajectory branching delays. To address this, we propose Gaussianization preprocessing: prior to training, image data are transformed into an approximately independent standard Gaussian distribution via invertible mappings (e.g., normalizing flows or empirical CDF transforms). This is the first integration of such reversible Gaussianization into the diffusion training pipeline, aiming to mitigate branching at the data-distribution level. Experiments across multiple image benchmarks demonstrate that, under small UNet architectures, our method improves PSNR by 12% in early denoising steps, reduces sampling steps by 30% without sacrificing visual fidelity, and yields significant FID reduction alongside accelerated convergence. Our core contribution lies in reshaping the data distribution to lower task complexity, thereby enhancing stability, efficiency, and early reconstruction quality—especially for resource-constrained diffusion models.
📝 Abstract
Diffusion models are a class of generative models that have demonstrated remarkable success in tasks such as image generation. However, one of the bottlenecks of these models is slow sampling due to the delay before the onset of trajectory bifurcation, at which point substantial reconstruction begins. This issue degrades generation quality, especially in the early stages. Our primary objective is to mitigate bifurcation-related issues by preprocessing the training data to enhance reconstruction quality, particularly for small-scale network architectures. Specifically, we propose applying Gaussianization preprocessing to the training data to make the target distribution more closely resemble an independent Gaussian distribution, which serves as the initial density of the reconstruction process. This preprocessing step simplifies the model's task of learning the target distribution, thereby improving generation quality even in the early stages of reconstruction with small networks. The proposed method is, in principle, applicable to a broad range of generative tasks, enabling more stable and efficient sampling processes.