🤖 AI Summary
What scoring function do diffusion models learn? Theory shows that perfect empirical score matching would merely memorize training data, undermining generative capability. This work identifies “selective underfitting” as the key mechanism enabling generalization: the model adaptively fits the empirical score across input space—achieving high-fidelity approximation in semantically critical regions while deliberately underfitting in redundant or noisy regions. We formally characterize this mechanism for the first time, grounding it in score-matching theory and inductive bias analysis, and design interpretable, empirically grounded interventions to locate and validate underfitted regions. Experiments demonstrate that selective underfitting unifies the explanation of diffusion models’ trade-offs among sample diversity, fidelity, and out-of-distribution generalization. Our framework provides a testable, theoretically principled account of their generative behavior. (149 words)
📝 Abstract
Diffusion models have emerged as the principal paradigm for generative modeling across various domains. During training, they learn the score function, which in turn is used to generate samples at inference. They raise a basic yet unsolved question: which score do they actually learn? In principle, a diffusion model that matches the empirical score in the entire data space would simply reproduce the training data, failing to generate novel samples. Recent work addresses this question by arguing that diffusion models underfit the empirical score due to training-time inductive biases. In this work, we refine this perspective, introducing the notion of selective underfitting: instead of underfitting the score everywhere, better diffusion models more accurately approximate the score in certain regions of input space, while underfitting it in others. We characterize these regions and design empirical interventions to validate our perspective. Our results establish that selective underfitting is essential for understanding diffusion models, yielding new, testable insights into their generalization and generative performance.