🤖 AI Summary
This work addresses the high computational cost of diffusion model inference caused by iterative denoising, where existing caching strategies struggle to optimize generation quality under user-specified compute budgets. To this end, we propose the first unsupervised reinforcement learning approach that treats the compute budget as a controllable input and leverages REINFORCE-based policy gradients to automatically learn optimal recomputation scheduling policies. Our method requires no labeled data, avoids full backpropagation, and is compatible with diverse caching mechanisms, thereby eliminating manual threshold tuning and enabling adaptive performance across varying budgets. Experiments demonstrate significant gains: on FLUX, it reduces FLOPs by 5.04× while lowering LPIPS by 31%; on Wan 2.1, it achieves a 2.6× speedup with a 65% LPIPS reduction and a 7% improvement in VBench score.
📝 Abstract
Modern diffusion models generate high-quality images and videos, but their iterative denoising process makes inference expensive. Feature caching accelerates sampling by reusing or predicting intermediate activations across neighboring denoising steps, exploiting the redundancy of computations along the reverse trajectory. In this work, we focus on the caching schedule: selecting which denoising steps should be fully recomputed. Existing schedules are either fixed (e.g. uniform) or chosen adaptively from per-step error heuristics; in both cases, the actual compute cost is a side-effect of hand-tuned thresholds rather than a quantity the user can specify. We propose ReCache, which inverts this: given a target budget k, it learns the recomputation schedule that maximizes generation quality, turning compute into a directly controllable input. ReCache trains via policy gradients, sidestepping backpropagation through full diffusion inference, and uses no labelled data. Generations from uncached inference serve as matching targets, paired with a reward for generation quality. ReCache is compatible with any caching mechanism, including feature reuse and feature forecasting; for each mechanism, a single trained policy adapts across computational budgets at inference time. ReCache consistently outperforms scheduling baselines: under a $\times5.04$ FLOPs reduction on FLUX, it reduces LPIPS by 31% (from 0.456 to 0.316) compared to DiCache; on Wan 2.1 at a $\sim \times2.6$ speedup, it drops LPIPS by 65% (from 0.480 to 0.169) and boosts the VBench score by 7% (5.6 points, from 70.4 to 76.0) over uniform HiCache. Code is available at https://github.com/thecrazymage/ReCache.