Budget-Constrained Step-Level Diffusion Caching

📅 2026-06-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing stepwise caching methods for diffusion models, which rely on heuristic thresholds and struggle to jointly optimize generation quality and inference latency under a fixed computational budget. To overcome this, the authors propose BudCache, a novel framework that shifts cache decisions from error-threshold-driven heuristics to explicit budget-constrained optimization. BudCache performs offline combinatorial optimization to identify the optimal caching strategy, thereby eliminating online overhead. The approach efficiently solves this optimization problem by integrating simulated annealing with deterministic hill climbing and further introduces a cache-aware time discretization alignment mechanism to mitigate trajectory mismatch. Experiments on FLUX.1-dev and Wan2.1 demonstrate that BudCache significantly outperforms current heuristic caching baselines in generation quality under identical inference budgets.
📝 Abstract
Step-level caching accelerates diffusion models by exploiting temporal redundancy across denoising steps. Existing methods make per-step cache decisions using threshold-based heuristics, without directly optimizing for final output quality. As a result, their inference latency varies across inputs and is difficult to control at deployment. In this work, we propose BudCache, which inverts this formulation: rather than letting per-step error thresholds dictate the runtime cost, we fix the compute budget in advance and search for the cache policy that best preserves the final output. To tackle the combinatorial complexity of step selection, we combine Simulated Annealing with deterministic Hill Climbing. This offline search identifies high-quality cache policies within minutes and introduces no online search or thresholding overhead during inference. When the compute budget is very tight, we further introduce cache-aware schedule alignment, which adapts the time discretization to the selected cache policy to reduce cache-induced trajectory mismatch. Experiments on FLUX.1-dev and Wan2.1 show that BudCache achieves better generation quality than heuristic caching baselines under the same inference budgets. Code is available at https://github.com/Westlake-AGI-Lab/BudCache
Problem

Research questions and friction points this paper is trying to address.

step-level caching
diffusion models
compute budget
inference latency
output quality
Innovation

Methods, ideas, or system contributions that make the work stand out.

step-level caching
budget-constrained optimization
diffusion models
cache-aware scheduling
combinatorial search