An Analytical Theory of Power Law Spectral Bias in the Learning Dynamics of Diffusion Models

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the dynamic evolution of model weights and the generated distribution during diffusion model training, with a focus on quantifying their convergence rates relative to the data spectrum (i.e., eigenvalue distribution of the data covariance). We derive, for the first time, a closed-form solution for the gradient flow of linear denoisers and integrate it with Gaussian equivalence, spectral decomposition, and KL-divergence analysis to establish the first analytical framework characterizing training dynamics. Our theory reveals that both weight updates and generated-distribution convergence obey an inverse power-law dependence on the data covariance eigenvalues—termed “spectral bias.” This bias causes slower learning of low-variance (high-frequency or fine-detail) modes, explaining why early stopping induces perceptual blurring and loss of detail. We empirically validate the robustness of this power law on both synthetic Gaussian data and real-world image datasets. The findings provide theoretical grounding for early-stopping strategies and inform the design of adaptive, spectrum-aware training dynamics.

Technology Category

Application Category

📝 Abstract
We developed an analytical framework for understanding how the learned distribution evolves during diffusion model training. Leveraging the Gaussian equivalence principle, we derived exact solutions for the gradient-flow dynamics of weights in one- or two-layer linear denoiser settings with arbitrary data. Remarkably, these solutions allowed us to derive the generated distribution in closed form and its KL divergence through training. These analytical results expose a pronounced power-law spectral bias, i.e., for weights and distributions, the convergence time of a mode follows an inverse power law of its variance. Empirical experiments on both Gaussian and image datasets demonstrate that the power-law spectral bias remains robust even when using deeper or convolutional architectures. Our results underscore the importance of the data covariance in dictating the order and rate at which diffusion models learn different modes of the data, providing potential explanations for why earlier stopping could lead to incorrect details in image generative models.
Problem

Research questions and friction points this paper is trying to address.

Developed analytical framework for diffusion model training dynamics.
Derived exact solutions for gradient-flow dynamics in linear denoisers.
Explained power-law spectral bias in learning data modes.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analytical framework for diffusion model dynamics
Gaussian equivalence principle for exact solutions
Power-law spectral bias in learning convergence
🔎 Similar Papers
No similar papers found.