Budget-Constrained Step-Level Diffusion Caching

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing stepwise caching methods for diffusion models, which rely on heuristic thresholds and struggle to jointly optimize generation quality and inference latency under a fixed computational budget. To overcome this, the authors propose BudCache, a novel framework that shifts cache decisions from error-threshold-driven heuristics to explicit budget-constrained optimization. BudCache performs offline combinatorial optimization to identify the optimal caching strategy, thereby eliminating online overhead. The approach efficiently solves this optimization problem by integrating simulated annealing with deterministic hill climbing and further introduces a cache-aware time discretization alignment mechanism to mitigate trajectory mismatch. Experiments on FLUX.1-dev and Wan2.1 demonstrate that BudCache significantly outperforms current heuristic caching baselines in generation quality under identical inference budgets.

📝 Abstract

Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output quality. As a result, their inference latency varies across inputs and is difficult to control at deployment. In this work, we propose BudCache, which inverts this formulation: rather than letting per-step error thresholds dictate the runtime cost, we fix the compute budget in advance and search for the cache policy that best preserves the final output. To tackle the combinatorial complexity of step selection, we combine Simulated Annealing with deterministic Hill Climbing. This offline search identifies high-quality cache policies within minutes and introduces no online search or thresholding overhead during inference. When the compute budget is very tight, we further introduce cache-aware schedule alignment, which adapts the time discretization to the selected cache policy to reduce cache-induced trajectory mismatch. Experiments on FLUX.1-dev and Wan2.1 show that BudCache achieves better generation quality than heuristic caching baselines under the same inference budgets. Code is available at https://github.com/Westlake-AGI-Lab/BudCache

Problem

Research questions and friction points this paper is trying to address.

step-level caching

diffusion models

compute budget

inference latency

output quality

Innovation

Methods, ideas, or system contributions that make the work stand out.

step-level caching

budget-constrained optimization

diffusion models