🤖 AI Summary
Existing video diffusion acceleration methods rely on uniform heuristics or time-embedding variants, requiring extensive prompt-specific calibration—leading to prompt overfitting and output inconsistency. This work identifies a cross-model and cross-prompt universal monotonic decay law governing residual magnitude, and proposes an adaptive amplitude-aware caching mechanism calibrated from only a single sample: it models reconstruction error via residual magnitude ratios to enable dynamic timestep skipping and adaptive feature reuse. Evaluated on Open-Sora and Wan 2.1, our method achieves 2.1× and 2.68× inference speedup, respectively, while consistently outperforming state-of-the-art approaches in LPIPS, SSIM, and PSNR. The core contribution is the first discovery of this universal residual magnitude decay law, enabling a lightweight, highly generalizable, single-sample adaptive acceleration paradigm for video diffusion models.
📝 Abstract
Existing acceleration techniques for video diffusion models often rely on uniform heuristics or time-embedding variants to skip timesteps and reuse cached features. These approaches typically require extensive calibration with curated prompts and risk inconsistent outputs due to prompt-specific overfitting. In this paper, we introduce a novel and robust discovery: a unified magnitude law observed across different models and prompts. Specifically, the magnitude ratio of successive residual outputs decreases monotonically and steadily in most timesteps while rapidly in the last several steps. Leveraging this insight, we introduce a Magnitude-aware Cache (MagCache) that adaptively skips unimportant timesteps using an error modeling mechanism and adaptive caching strategy. Unlike existing methods requiring dozens of curated samples for calibration, MagCache only requires a single sample for calibration. Experimental results show that MagCache achieves 2.1x and 2.68x speedups on Open-Sora and Wan 2.1, respectively, while preserving superior visual fidelity. It significantly outperforms existing methods in LPIPS, SSIM, and PSNR, under comparable computational budgets.