Accelerate High-Quality Diffusion Models with Inner Loop Feedback

📅 2025-01-22

📈 Citations: 0

✨ Influential: 0

career value

213K/year

🤖 AI Summary

Slow inference speed hinders the practical deployment of high-fidelity diffusion models. Method: This paper proposes an Inner-Loop Feedback (ILF) mechanism that employs a lightweight, learnable feedback module to predict future denoising features, enabling efficient acceleration. It introduces, for the first time, a feedback architecture built upon backbone-isomorphic blocks; adopts zero-initialized scaling factors to dynamically modulate feedback strength; and freezes the backbone network while distilling knowledge exclusively into the feedback module. Results: Evaluated on DiT and PixArt-alpha/sigma, ILF achieves 1.7–1.8× speedup with only 20 sampling steps—matching the FID, CLIP Score, and ImageReward of the original 20-step baseline, and significantly outperforming existing 1–4-step acceleration methods. To our knowledge, this is the first approach to enable backbone-aware temporal feature prediction and efficient knowledge distillation while preserving state-of-the-art generation quality.

Technology Category

Application Category

📝 Abstract

We propose Inner Loop Feedback (ILF), a novel approach to accelerate diffusion models' inference. ILF trains a lightweight module to predict future features in the denoising process by leveraging the outputs from a chosen diffusion backbone block at a given time step. This approach exploits two key intuitions; (1) the outputs of a given block at adjacent time steps are similar, and (2) performing partial computations for a step imposes a lower burden on the model than skipping the step entirely. Our method is highly flexible, since we find that the feedback module itself can simply be a block from the diffusion backbone, with all settings copied. Its influence on the diffusion forward can be tempered with a learnable scaling factor from zero initialization. We train this module using distillation losses; however, unlike some prior work where a full diffusion backbone serves as the student, our model freezes the backbone, training only the feedback module. While many efforts to optimize diffusion models focus on achieving acceptable image quality in extremely few steps (1-4 steps), our emphasis is on matching best case results (typically achieved in 20 steps) while significantly reducing runtime. ILF achieves this balance effectively, demonstrating strong performance for both class-to-image generation with diffusion transformer (DiT) and text-to-image generation with DiT-based PixArt-alpha and PixArt-sigma. The quality of ILF's 1.7x-1.8x speedups are confirmed by FID, CLIP score, CLIP Image Quality Assessment, ImageReward, and qualitative comparisons.

Problem

Research questions and friction points this paper is trying to address.

Diffusion Models

Quality Improvement

Speed Optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

ILF (Inner Loop Feedback)

Efficient Image Generation

Predictive and Memory Mechanism

🔎 Similar Papers

A Closer Look at Time Steps is Worthy of Triple Speed-Up for Diffusion Model Training