Adaptive Training Meets Progressive Scaling: Elevating Efficiency in Diffusion Models

📅 2023-12-20
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion models employ a uniform denoising network across all timesteps, overlooking variations in data distribution and task difficulty—thereby limiting both performance and training efficiency. To address this, we propose a Two-Stage Divide-and-Conquer (TDC) training framework. First, timesteps are grouped based on task similarity and difficulty, with dedicated denoising subnetworks assigned to each group. Second, model pruning is formulated as a multi-round surrogate decision problem, solved via surrogate-driven pruning and progressive scaling optimization for efficient lightweighting. Evaluated on ImageNet64, TDC achieves a 1.5-point FID reduction, ~20% computational cost savings, and lower total training cost than a single-model baseline. By enabling timestep-adaptive modeling and efficient architectural customization, TDC establishes a novel paradigm for scalable and task-aware diffusion model design.
📝 Abstract
Diffusion models have demonstrated remarkable efficacy in various generative tasks with the predictive prowess of denoising model. Currently, diffusion models employ a uniform denoising model across all timesteps. However, the inherent variations in data distributions at different timesteps lead to conflicts during training, constraining the potential of diffusion models. To address this challenge, we propose a novel two-stage divide-and-conquer training strategy termed TDC Training. It groups timesteps based on task similarity and difficulty, assigning highly customized denoising models to each group, thereby enhancing the performance of diffusion models. While two-stage training avoids the need to train each model separately, the total training cost is even lower than training a single unified denoising model. Additionally, we introduce Proxy-based Pruning to further customize the denoising models. This method transforms the pruning problem of diffusion models into a multi-round decision-making problem, enabling precise pruning of diffusion models. Our experiments validate the effectiveness of TDC Training, demonstrating improvements in FID of 1.5 on ImageNet64 compared to original IDDPM, while saving about 20% of computational resources.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Models
Step-dependent Denoising
Performance Limitation
Innovation

Methods, ideas, or system contributions that make the work stand out.

TDC Training Method
Adaptive Training with Progressive Scaling
Proxy-based Pruning Technique
🔎 Similar Papers
No similar papers found.
W
Wenhao Li
University of Sydney
X
Xiu Su
Central South University
Y
Yu Han
Nanjing Forestry University
Shan You
Shan You
SenseTime Research
deep learningmultimodal LLMedge AI
T
Tao Huang
University of Sydney
C
Chang Xu
University of Sydney