DiffusionBlocks: Blockwise Training for Generative Models via Score-Based Diffusion

📅 2025-06-17

📈 Citations: 0

✨ Influential: 0

career value

180K/year

🤖 AI Summary

Large neural networks face severe GPU memory bottlenecks during end-to-end backpropagation, limiting model accessibility and scalable training. To address this, we propose Modular Diffusion Training (MDT), a framework that partitions the network into independently trainable denoising modules, each modeled as a score-matching unit within a continuous-time diffusion process. MDT innovatively couples modular training with diffusion modeling: leveraging the equal cumulative probability mass principle, it adaptively assigns noise levels to modules, enabling decoupled optimization of denoising blocks. On image generation and language modeling tasks, memory consumption scales linearly—rather than superlinearly—with the number of modules, substantially alleviating memory pressure. Moreover, training stability improves markedly, and final performance surpasses that of standard end-to-end backpropagation baselines. MDT establishes a novel, memory-efficient paradigm for large-model training without sacrificing convergence or accuracy.

Technology Category

Application Category

📝 Abstract

Training large neural networks with end-to-end backpropagation creates significant memory bottlenecks, limiting accessibility to state-of-the-art AI research. We propose $ extit{DiffusionBlocks}$, a novel training framework that interprets neural network blocks as performing denoising operations in a continuous-time diffusion process. By partitioning the network into independently trainable blocks and optimizing noise level assignments based on equal cumulative probability mass, our approach achieves significant memory efficiency while maintaining competitive performance compared to traditional backpropagation in generative tasks. Experiments on image generation and language modeling tasks demonstrate memory reduction proportional to the number of blocks while achieving superior performance. DiffusionBlocks provides a promising pathway for democratizing access to large-scale neural network training with limited computational resources.

Problem

Research questions and friction points this paper is trying to address.

Reduces memory bottlenecks in large neural network training

Enables blockwise training via diffusion-based denoising operations

Improves accessibility to large-scale training with limited resources

Innovation

Methods, ideas, or system contributions that make the work stand out.

Blockwise training via diffusion process

Independent block optimization with noise

Memory-efficient large network training

🔎 Similar Papers

Decouple-Then-Merge: Finetune Diffusion Models as Multi-Task Learning