Making Time Editable in Video Diffusion Transformers

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited controllability of temporal dynamics and editing in existing video diffusion Transformer models. To overcome this, the authors propose a lightweight temporal control module that enables explicit and fine-grained manipulation of motion speed and temporal structure without altering the pre-trained DiT backbone. By effectively leveraging the generative priors learned during pre-training, the method significantly enhances temporal controllability in video generation while preserving the original output quality. The approach thus offers a practical and efficient solution for precise temporal editing in diffusion-based video synthesis.
📝 Abstract
Modern Diffusion Transformers for video generation provide limited control over the progression of time and the editing of temporal dynamics. We propose a temporal-control methodology that extends a pretrained DiT with explicit time editing, allowing control over motion speed and temporal structure without redesigning the backbone. Its core implementation augments the pretrained model with a lightweight temporal module, preserving the original generative prior while expanding its controllable dynamic range.
Problem

Research questions and friction points this paper is trying to address.

video generation
temporal control
diffusion transformers
time editing
motion dynamics
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal Control
Video Diffusion Transformers
Time Editing
Motion Speed Manipulation
Lightweight Temporal Module