PipeDiT: Accelerating Diffusion Transformers in Video Generation with Task Pipelining and Model Decoupling

📅 2025-11-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the slow inference speed and high GPU memory consumption of diffusion Transformer (DiT)-based video generation models, this work proposes a system-level acceleration framework. First, we design PipeSP, a pipeline algorithm supporting sequence parallelism. Second, we introduce DeDiVAE, a mechanism that functionally decouples the diffusion model from the VAE encoder/decoder across GPU node groups. Third, we propose Attention Collaboration (Aco), a mechanism that improves GPU computational resource utilization through coordinated attention computation. The framework is efficiently integrated into OpenSoraPlan and HunyuanVideo. Experiments on an 8-GPU cluster demonstrate end-to-end speedups of 1.06–4.02×, significantly improving throughput and GPU memory efficiency. Our approach establishes a scalable, distributed inference paradigm for large-scale video generation models.

Technology Category

Application Category

📝 Abstract
Video generation has been advancing rapidly, and diffusion transformer (DiT) based models have demonstrated remark- able capabilities. However, their practical deployment is of- ten hindered by slow inference speeds and high memory con- sumption. In this paper, we propose a novel pipelining frame- work named PipeDiT to accelerate video generation, which is equipped with three main innovations. First, we design a pipelining algorithm (PipeSP) for sequence parallelism (SP) to enable the computation of latent generation and commu- nication among multiple GPUs to be pipelined, thus reduc- ing inference latency. Second, we propose DeDiVAE to de- couple the diffusion module and the variational autoencoder (VAE) module into two GPU groups, whose executions can also be pipelined to reduce memory consumption and infer- ence latency. Third, to better utilize the GPU resources in the VAE group, we propose an attention co-processing (Aco) method to further reduce the overall video generation latency. We integrate our PipeDiT into both OpenSoraPlan and Hun- yuanVideo, two state-of-the-art open-source video generation frameworks, and conduct extensive experiments on two 8- GPU systems. Experimental results show that, under many common resolution and timestep configurations, our PipeDiT achieves 1.06x to 4.02x speedups over OpenSoraPlan and HunyuanVideo.
Problem

Research questions and friction points this paper is trying to address.

Accelerates slow inference in diffusion transformer video generation
Reduces high memory consumption in video generation models
Optimizes GPU resource usage through pipelining and decoupling
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pipelining algorithm enables parallel GPU computation
Decoupling diffusion and VAE modules reduces latency
Attention co-processing method optimizes GPU resource usage
🔎 Similar Papers
No similar papers found.
S
Sijie Wang
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
Q
Qiang Wang
School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen
Shaohuai Shi
Shaohuai Shi
Professor, Harbin Institute of Technology, Shenzhen
Machine Learning SystemsParallel and Distributed ComputingGPU ComputingDeep Learning