Adaptive Non-Uniform Timestep Sampling for Diffusion Model Training

📅 2024-11-15
🏛️ arXiv.org
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
In diffusion model training, uniform timestep sampling ignores the variance heterogeneity of gradients across timesteps, rendering high-variance timesteps convergence bottlenecks. To address this, we propose the first online evaluation mechanism that dynamically assesses—per iteration—the impact of gradient updates on the objective function, enabling adaptive, non-uniform timestep sampling focused on optimization-sensitive timesteps. Our method transcends conventional static weighting or heuristic sampling by unifying gradient variance analysis, objective impact tracking, and importance sampling. It achieves principled, real-time timestep prioritization without requiring precomputed statistics or architectural modifications. Experiments across diverse datasets, noise schedules, and network architectures demonstrate consistent improvements: our approach accelerates convergence and enhances final model performance compared to state-of-the-art timestep sampling and weighting strategies.

Technology Category

Application Category

📝 Abstract
As a highly expressive generative model, diffusion models have demonstrated exceptional success across various domains, including image generation, natural language processing, and combinatorial optimization. However, as data distributions grow more complex, training these models to convergence becomes increasingly computationally intensive. While diffusion models are typically trained using uniform timestep sampling, our research shows that the variance in stochastic gradients varies significantly across timesteps, with high-variance timesteps becoming bottlenecks that hinder faster convergence. To address this issue, we introduce a non-uniform timestep sampling method that prioritizes these more critical timesteps. Our method tracks the impact of gradient updates on the objective for each timestep, adaptively selecting those most likely to minimize the objective effectively. Experimental results demonstrate that this approach not only accelerates the training process, but also leads to improved performance at convergence. Furthermore, our method shows robust performance across various datasets, scheduling strategies, and diffusion architectures, outperforming previously proposed timestep sampling and weighting heuristics that lack this degree of robustness.
Problem

Research questions and friction points this paper is trying to address.

Addresses high gradient variance in diffusion model training
Introduces adaptive non-uniform timestep sampling method
Accelerates convergence while improving final model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-uniform timestep sampling for diffusion models
Adaptive selection of high-impact timesteps
Reduces gradient variance to accelerate training
🔎 Similar Papers
No similar papers found.