ProCache: Constraint-Aware Feature Caching with Selective Computation for Diffusion Transformer Acceleration

📅 2025-12-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address error accumulation and limited efficiency in feature caching for real-time deployment of Diffusion Transformers (DiTs)—caused by fixed-interval caching and full-feature reuse—this paper proposes a training-free dynamic feature caching framework. Leveraging the time-varying evolution of DiT features, the framework introduces a temporal-aware, non-uniform caching scheduler that enables selective computation at both depth and token levels. It further pioneers a synergistic mechanism combining constraint-aware cache pattern search with lightweight cache state prediction to support adaptive, online caching decisions. Evaluated on PixArt-α and DiT models, the method achieves 1.96× and 2.90× inference speedup, respectively, while incurring less than 0.5% degradation in FID and CLIP scores—substantially outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Diffusion Transformers (DiTs) have achieved state-of-the-art performance in generative modeling, yet their high computational cost hinders real-time deployment. While feature caching offers a promising training-free acceleration solution by exploiting temporal redundancy, existing methods suffer from two key limitations: (1) uniform caching intervals fail to align with the non-uniform temporal dynamics of DiT, and (2) naive feature reuse with excessively large caching intervals can lead to severe error accumulation. In this work, we analyze the evolution of DiT features during denoising and reveal that both feature changes and error propagation are highly time- and depth-varying. Motivated by this, we propose ProCache, a training-free dynamic feature caching framework that addresses these issues via two core components: (i) a constraint-aware caching pattern search module that generates non-uniform activation schedules through offline constrained sampling, tailored to the model's temporal characteristics; and (ii) a selective computation module that selectively computes within deep blocks and high-importance tokens for cached segments to mitigate error accumulation with minimal overhead. Extensive experiments on PixArt-alpha and DiT demonstrate that ProCache achieves up to 1.96x and 2.90x acceleration with negligible quality degradation, significantly outperforming prior caching-based methods.
Problem

Research questions and friction points this paper is trying to address.

Accelerates Diffusion Transformers for real-time use
Optimizes non-uniform caching to match DiT dynamics
Reduces error accumulation in feature reuse
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-uniform caching schedules via constrained sampling
Selective computation in deep blocks and high-importance tokens
Training-free dynamic feature caching for diffusion transformers
🔎 Similar Papers
No similar papers found.
F
Fanpu Cao
South China University of Technology, China
Yaofo Chen
Yaofo Chen
South China University of Technology
Large Language ModelsAutoMLModel AdaptationRobustness
Zeng You
Zeng You
South China University of Technology, Peng Cheng Laboratory
Computer VisionAction DetectionEfficient network architecture
W
Wei Luo
South China Agricultural University, China
C
Cen Chen
South China University of Technology, China