Accelerating Diffusion Transformer via Gradient-Optimized Cache

πŸ“… 2025-03-07
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address progressive error accumulation and inaccurate error correction under dynamic perturbations caused by high-ratio feature caching (>50%) in Diffusion Transformers (DiTs), this paper proposes a gradient-optimized caching mechanism. Our method introduces: (1) a novel cached-gradient propagation scheme that dynamically models and weights the gradient discrepancy between cached and recomputed features for backpropagation; and (2) inflection-point-aware optimization, which identifies critical denoising steps via trajectory statistics to align gradient update directions and avoid conflicting updates. Evaluated on ImageNet with 50% block-level caching, our approach achieves a 26.3% improvement in Inception Score (IS) to 216.28 and a 43% reduction in FrΓ©chet Inception Distance (FID) to 3.907, with zero additional computational overhead. Moreover, it demonstrates strong robustness across diverse caching ratios.

Technology Category

Application Category

πŸ“ Abstract
Feature caching has emerged as an effective strategy to accelerate diffusion transformer (DiT) sampling through temporal feature reuse. It is a challenging problem since (1) Progressive error accumulation from cached blocks significantly degrades generation quality, particularly when over 50% of blocks are cached; (2) Current error compensation approaches neglect dynamic perturbation patterns during the caching process, leading to suboptimal error correction. To solve these problems, we propose the Gradient-Optimized Cache (GOC) with two key innovations: (1) Cached Gradient Propagation: A gradient queue dynamically computes the gradient differences between cached and recomputed features. These gradients are weighted and propagated to subsequent steps, directly compensating for the approximation errors introduced by caching. (2) Inflection-Aware Optimization: Through statistical analysis of feature variation patterns, we identify critical inflection points where the denoising trajectory changes direction. By aligning gradient updates with these detected phases, we prevent conflicting gradient directions during error correction. Extensive evaluations on ImageNet demonstrate GOC's superior trade-off between efficiency and quality. With 50% cached blocks, GOC achieves IS 216.28 (26.3% higher) and FID 3.907 (43% lower) compared to baseline DiT, while maintaining identical computational costs. These improvements persist across various cache ratios, demonstrating robust adaptability to different acceleration requirements.
Problem

Research questions and friction points this paper is trying to address.

Reduces error accumulation in cached blocks for diffusion transformers.
Improves dynamic error compensation during feature caching process.
Enhances efficiency-quality trade-off in diffusion transformer sampling.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient queue dynamically computes feature differences
Inflection-aware optimization aligns gradient updates
GOC improves efficiency and quality simultaneously
πŸ”Ž Similar Papers
No similar papers found.
J
Junxiang Qiu
University of Science and Technology of China, Hefei, China
L
Lin Liu
University of Science and Technology of China, Hefei, China
S
Shuo Wang
University of Science and Technology of China, Hefei, China
Jinda Lu
Jinda Lu
University of Science and Technology of China
K
Kezhou Chen
University of Science and Technology of China, Hefei, China
Yanbin Hao
Yanbin Hao
Hefei University of Technology
Video retrievalvideo action recognitionhashingVideo Hyperlinking