Diffusion Models Observe Only Gradients: A Geometric Perspective on Score Matching Errors

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the limitation of conventional L² score-matching error as a metric for evaluating diffusion models, which often fails to accurately reflect the true discrepancy between the generated and target distributions. From a geometric perspective, the authors employ Helmholtz–Hodge decomposition to reveal that only the gradient component of the score error influences the evolution of the marginal distribution, while the curl component contributes nothing to sampling. This insight demonstrates that the L² error is not an intrinsic measure of distributional quality and leads to a tight upper bound on the KL divergence that depends solely on the gradient component. By integrating the Fokker–Planck equation with Girsanov’s theorem, the paper proposes a computable estimator for this gradient component, which exhibits significantly stronger correlation with sample quality than the full L² error.

📝 Abstract

Score-based diffusion models are typically trained by minimizing the $L^2$ score matching error, and standard theoretical analyses rely on this quantity to bound the sampling discrepancy between the learned and target distributions. We show the $L^2$ score error is not the right intrinsic measure of marginal distributional quality: a learned diffusion model can incur arbitrarily large $L^2$ score error while perfectly matching the target distribution. By decomposing score errors into a gradient and a solenoidal component (a Helmholtz-Hodge decomposition), we identify the geometric reason behind this: only the gradient component enters the marginal Fokker-Planck dynamics, while the solenoidal component is structurally invisible. We make this precise in three results. First, building on the corrected geometry, we prove an impossibility result: no monotone function of the $L^2$ score error can uniformly lower bound any divergence between the learned and target distributions. Second, we derive an upper bound on the Kullback-Leibler divergence that depends only on the observable gradient component of the error, tightening the standard Girsanov bound and identifying its looseness as the cost of operating on path-space rather than marginal-space dynamics. Third, we give a tractable estimator of the gradient component via a dual Sobolev identity, which is shown to empirically correlate substantially better with sample quality than the full $L^2$ error.

Problem

Research questions and friction points this paper is trying to address.

score matching

diffusion models

Helmholtz-Hodge decomposition

distributional discrepancy

gradient component

Innovation

Methods, ideas, or system contributions that make the work stand out.

score-based diffusion models

Helmholtz-Hodge decomposition

gradient component