🤖 AI Summary
This work addresses the instability and artifact generation in large-step diffusion ODE inference, where strong denoisers combined with large integration steps often induce contractivity-related instabilities. To mitigate this issue, the authors propose SteinDiff, a framework that introduces a geometry-aware residual correction mechanism based on Stein’s method during inference. This approach regularizes solver updates without requiring reference samples or retraining the model. The paper derives, for the first time, a closed-form expression for the correction coefficient, establishes perturbation bounds for score-based control under distribution shift, and integrates EDM parameterization with rigorous stability analysis. Empirically, SteinDiff substantially alleviates artifacts in large-step inference and consistently improves sample quality across diverse diffusion model configurations.
📝 Abstract
A fundamental tension exists in the large-step inference of diffusion models via their deterministic probability flow ordinary differential equation (PF-ODE) trajectories, which we identify as the contractivity trap: efficient inference favors large step sizes, while aggressive steps and highly expressive denoisers can undermine contraction-based stability certificates for error suppression. To address this, we propose SteinDiff, a step-wise inference-time stabilization framework that employs Stein-derived corrections without requiring reference samples. Specifically, SteinDiff introduces a geometry-aware residual correction mechanism that regularizes large-step solver updates without retraining. To this end, we derive a closed-form Stein correction coefficient for step-wise solver adjustment, enabling reference-free adaptation to local data geometry. We further establish a score-controlled perturbation bound under distributional shifts and provide a complementary Stein perspective on EDM-style parameterizations. Extensive experiments demonstrate that SteinDiff mitigates severe artifacts and improves generative quality across large-step inference settings.