A Quantitative Approximation Framework for Flow Distillation in Diffusion Models

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the error amplification in few-step sampling of diffusion models, particularly caused by dynamical stiffness in low-noise, multimodal regions. The authors model the sampling process as error propagation under a learned flow map and, for the first time, decouple the core challenges into score field approximation and dynamical error amplification. They introduce a quantitative approximation framework based on a non-uniform time grid derived from cumulative stability coordinates, combined with $L^p(p_t)$ approximation theory for ReLU–ReQU networks, an explicit bound on the spatial Lipschitz constant of the probability flow ODE, and Jacobian integral stability analysis. Theoretically, they prove that network size scales polylogarithmically with respect to approximation accuracy. Experiments show that an 8-segment non-uniform grid reduces end-to-end relative MSE by up to 51.9% compared to uniform grids and expose structural limitations of single-step distillation under Lipschitz mismatch.

📝 Abstract

We develop a quantitative approximation framework for diffusion distillation, viewing few-step sampling as error propagation under compositions of learned flow maps. Focusing on trajectory distillation for the probability-flow ODE, we show that local approximation errors can be strongly amplified in low-noise multimodal regimes, where the underlying dynamics become stiff. In an analytically tractable Gaussian-mixture Ornstein--Uhlenbeck setting, we separate two core difficulties: approximating the time-dependent score field and controlling the dynamical amplification governed by the time-integrated Jacobian bound of the probability-flow ODE. On the approximation side, we prove constructive L^p(p_t) guarantees showing that ReLU--ReQU networks approximate the Gaussian-mixture score uniformly over time, with depth and width scaling polylogarithmically in the target accuracy and explicitly with the mixture geometry. On the stability side, we derive an explicit bound L(t) for the spatial Lipschitz constant of the probability-flow velocity and convert it into a flow map stability estimate governed by \int_s^t L(u)\,du, making late-time amplification in stiff regimes computable. Building on these estimates, we prove that deep residual compositions efficiently approximate the long-horizon transport, with global error controlled by the stability amplification factor, and identify a Lipschitz-mismatch regime in which one-step distillation is structurally unfavorable. The resulting theory yields a stability-balanced non-uniform time grid obtained by uniform partitioning in the cumulative stability coordinate. Experiments support the prediction and reduce end-to-end relative MSE by up to 51.9\% with 8 segments compared with uniform grids.

Problem

Research questions and friction points this paper is trying to address.

flow distillation

diffusion models

error amplification

probability-flow ODE

stiff dynamics

Innovation

Methods, ideas, or system contributions that make the work stand out.

flow distillation

probability-flow ODE

score approximation