Denoising Task Difficulty-based Curriculum for Training Diffusion Models

📅 2024-03-15
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
The relative difficulty of denoising tasks across timesteps in diffusion models remains controversial. Method: This work systematically quantifies denoising difficulty per timestep, leveraging both the convergence behavior of denoising error and the relative entropy between true and predicted distributions—revealing that early (low-timestep) denoising is significantly more challenging. Building on this insight, we propose a “curriculum learning” paradigm: timesteps are clustered by difficulty and trained progressively in stages, with joint optimization of the noise schedule. Contribution/Results: Our approach departs from conventional parallel full-timestep training, requiring no architectural or loss-function modifications and remaining compatible with diverse diffusion model enhancements. Extensive experiments on unconditional generation, class-conditional generation, and text-to-image synthesis demonstrate substantial improvements in both model performance and convergence speed.

Technology Category

Application Category

📝 Abstract
Diffusion-based generative models have emerged as powerful tools in the realm of generative modeling. Despite extensive research on denoising across various timesteps and noise levels, a conflict persists regarding the relative difficulties of the denoising tasks. While various studies argue that lower timesteps present more challenging tasks, others contend that higher timesteps are more difficult. To address this conflict, our study undertakes a comprehensive examination of task difficulties, focusing on convergence behavior and changes in relative entropy between consecutive probability distributions across timesteps. Our observational study reveals that denoising at earlier timesteps poses challenges characterized by slower convergence and higher relative entropy, indicating increased task difficulty at these lower timesteps. Building on these observations, we introduce an easy-to-hard learning scheme, drawing from curriculum learning, to enhance the training process of diffusion models. By organizing timesteps or noise levels into clusters and training models with ascending orders of difficulty, we facilitate an order-aware training regime, progressing from easier to harder denoising tasks, thereby deviating from the conventional approach of training diffusion models simultaneously across all timesteps. Our approach leads to improved performance and faster convergence by leveraging benefits of curriculum learning, while maintaining orthogonality with existing improvements in diffusion training techniques. We validate these advantages through comprehensive experiments in image generation tasks, including unconditional, class-conditional, and text-to-image generation.
Problem

Research questions and friction points this paper is trying to address.

Denoising task difficulty conflict
Curriculum learning for diffusion models
Improved convergence in image generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Easy-to-hard curriculum learning
Order-aware training regime
Improved convergence and performance
🔎 Similar Papers
No similar papers found.
Jin-Young Kim
Jin-Young Kim
Twelvelabs
Diffusion modelArtificial IntelligenceGenerative modelRepresentation learning
Hyojun Go
Hyojun Go
PhD @ ETH Zurich / Google
Diffusion Model3D Generation
S
Soonwoo Kwon
TwelveLabs, Seoul, South Korea
H
Hyun-Gyoon Kim
Dept. of Financial Engineering, Ajou University, Gyeonnggi-do, South Korea