Ditto: Accelerating Diffusion Model via Temporal Value Similarity

📅 2025-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Diffusion models suffer from high computational overhead and low energy efficiency during inference. To address this, we propose a synergistic acceleration framework integrating temporal differencing and quantization. We first discover the narrow-distribution characteristic of latent variables across adjacent timesteps in the diffusion process, enabling a novel temporal differencing paradigm and dynamic execution flow optimization. Building upon this insight, we co-design algorithmic and hardware optimizations: quantization-aware differencing computation, layer-wise operation distribution-law optimization, a customized hardware accelerator (Ditto), and memory-aware scheduling. Experimental results demonstrate that our approach achieves up to 1.5× higher throughput and reduces energy consumption by 17.74% compared to state-of-the-art acceleration methods, significantly improving both inference efficiency and energy efficiency.

Technology Category

Application Category

📝 Abstract
Diffusion models achieve superior performance in image generation tasks. However, it incurs significant computation overheads due to its iterative structure. To address these overheads, we analyze this iterative structure and observe that adjacent time steps in diffusion models exhibit high value similarity, leading to narrower differences between consecutive time steps. We adapt these characteristics to a quantized diffusion model and reveal that the majority of these differences can be represented with reduced bit-width, and even zero. Based on our observations, we propose the Ditto algorithm, a difference processing algorithm that leverages temporal similarity with quantization to enhance the efficiency of diffusion models. By exploiting the narrower differences and the distributive property of layer operations, it performs full bit-width operations for the initial time step and processes subsequent steps with temporal differences. In addition, Ditto execution flow optimization is designed to mitigate the memory overhead of temporal difference processing, further boosting the efficiency of the Ditto algorithm. We also design the Ditto hardware, a specialized hardware accelerator, fully exploiting the dynamic characteristics of the proposed algorithm. As a result, the Ditto hardware achieves up to 1.5x speedup and 17.74% energy saving compared to other accelerators.
Problem

Research questions and friction points this paper is trying to address.

Diffusion Models
Efficiency Improvement
Resource Consumption Reduction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ditto Algorithm
Quantized Diffusion Model
Energy-Efficient Hardware
🔎 Similar Papers
No similar papers found.
S
Sungbin Kim
School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea
H
Hyunwuk Lee
School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea
W
Wonho Cho
School of Electrical and Electronic Engineering, Yonsei University, Seoul, Republic of Korea
Mincheol Park
Mincheol Park
Samsung Electronics
Artificial IntelligenceComputer Architecture
Won Woo Ro
Won Woo Ro
Yonsei University
computer architectureparallel processingmicroprocessorGPU