🤖 AI Summary
Existing generative ODE solvers are constrained by linear span spaces in few-step sampling, limiting their ability to correct out-of-domain residuals and thus compromising generation quality. This work proposes SpanLift—a lightweight neural solver that, without altering the backbone model or base solver, introduces a transferable spatial residual operator to enhance scalar coefficient updates and overcome the linear span bottleneck. Trained via endpoint teacher matching, SpanLift jointly models buffered states and velocities without requiring additional model evaluations. With only three function evaluations, it reduces the CIFAR-10 FID from 8.16 to 5.69 and the ImageNet FID from 17.37 to 11.83, achieving state-of-the-art performance in few-step sampling across multiple benchmarks.
📝 Abstract
Diffusion and flow generative models sample by integrating a learned ODE, but high quality still requires many sequential model evaluations. Solver learning reduces this cost by adapting scalar coefficients, timesteps, or both, while keeping the backbone model fixed. In this work, we identify a structural bottleneck in this update family: each step remains span-limited. Since the scalar-coefficient update lies in the span of buffered velocity evaluations, it can fit only the in-span component while leaving any out-of-span residual unreachable by scalar recombination alone. We propose SpanLift, a lightweight neural solver that augments scalar-coefficient updates with a spatial residual operator. SpanLift keeps a fixed base solver as an in-span prior and learns a spatial residual operator over the state and velocity buffer. The operator is trained by endpoint teacher matching, preserves the pretrained backbone, and adds no model NFEs. Empirically, the learned correction transfers across base solvers and is predominantly out-of-span. Across pixel-space diffusion, latent flow matching, and precipitation nowcasting, SpanLift achieves state-of-the-art few-step sampling. With only 3 NFE, it improves CIFAR-10 FID from 8.16 to 5.69 and ImageNet FID from 17.37 to 11.83.