Training-Free Motion Customization for Distilled Video Generators with Adaptive Test-Time Distillation

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing training-free motion customization methods struggle to adapt to distilled video generation models, as their accelerated sampling schemes and large-step denoising severely degrade motion fidelity. This paper introduces MotionEcho—the first training-free motion customization framework specifically designed for distilled video models—enabling efficient, high-fidelity motion synthesis via test-time adaptive distillation. Its core contributions are: (1) a diffusion teacher enforcement mechanism that aligns the student model’s outputs with those of the teacher at critical motion frames; (2) endpoint prediction coupled with interpolation guidance to enhance motion trajectory continuity; and (3) dynamic timestep allocation to balance inference speed and accuracy. Evaluated across multiple distilled video models and benchmarks, MotionEcho significantly improves motion fidelity and generation quality while maintaining millisecond-level inference latency—without any fine-tuning or retraining.

Technology Category

Application Category

📝 Abstract
Distilled video generation models offer fast and efficient synthesis but struggle with motion customization when guided by reference videos, especially under training-free settings. Existing training-free methods, originally designed for standard diffusion models, fail to generalize due to the accelerated generative process and large denoising steps in distilled models. To address this, we propose MotionEcho, a novel training-free test-time distillation framework that enables motion customization by leveraging diffusion teacher forcing. Our approach uses high-quality, slow teacher models to guide the inference of fast student models through endpoint prediction and interpolation. To maintain efficiency, we dynamically allocate computation across timesteps according to guidance needs. Extensive experiments across various distilled video generation models and benchmark datasets demonstrate that our method significantly improves motion fidelity and generation quality while preserving high efficiency. Project page: https://euminds.github.io/motionecho/
Problem

Research questions and friction points this paper is trying to address.

Customizing motion in distilled video generators without training
Overcoming limitations of training-free methods in fast models
Balancing motion fidelity and efficiency in video generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Training-free motion customization via adaptive distillation
Teacher forcing with endpoint prediction and interpolation
Dynamic computation allocation for efficiency
🔎 Similar Papers
No similar papers found.