🤖 AI Summary
This work addresses the lack of theoretical guarantees for classifier guidance in diffusion models under standard classifier training paradigms. Under a smoothness assumption on the classifier, the paper establishes the first quantitative relationship between the classifier’s cross-entropy error and the error in the guidance vector, highlighting the critical role of smoothness in ensuring reliable guidance. The necessity of this assumption is demonstrated through a counterexample. Leveraging conditional KL divergence and mean squared error analysis—combined with ideas from reverse logarithmic Sobolev-type inequalities—the authors prove that when the classifier’s conditional KL divergence is ε², the mean squared error of the guidance vector scales as Õ(dε). This result yields a theoretically grounded criterion for selecting classifiers suitable for guidance in diffusion-based generation.
📝 Abstract
Classifier-guided diffusion models generate conditional samples by augmenting the reverse-time score with the gradient of the log-probability predicted by a probabilistic classifier. In practice, this classifier is usually obtained by minimizing an empirical loss function. While existing statistical theory guarantees good generalization performance when the sample size is sufficiently large, it remains unclear whether such training yields an effective guidance mechanism. We study this question in the context of cross-entropy loss, which is widely used for classifier training. Under mild smoothness assumptions on the classifier, we show that controlling the cross-entropy at each diffusion model step is sufficient to control the corresponding guidance error. In particular, probabilistic classifiers achieving conditional KL divergence $\varepsilon^2$ induce guidance vectors with mean squared error $\widetilde O(d \varepsilon )$, up to constant and logarithmic factors. Our result yields an upper bound on the sampling error of classifier-guided diffusion models and bears resemblance to a reverse log-Sobolev--type inequality. To the best of our knowledge, this is the first result that quantitatively links classifier training to guidance alignment in diffusion models, providing both a theoretical explanation for the empirical success of classifier guidance, and principled guidelines for selecting classifiers that induce effective guidance.