Dynamic Motion Blending for Versatile Motion Editing

๐Ÿ“… 2025-03-26
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing text-guided motion editing methods suffer from poor generalization due to reliance on pre-collected motion-text triplets. This work proposes MotionCutMix, an online data augmentation strategy, and MotionReFit, an autoregressive diffusion model, introducing the first motion coordinator mechanism to resolve limb incoherence during dynamic motion splicing. By integrating online motion segment mixing, autoregressive temporal modeling, the motion coordinator module, and text-motion alignment representation learning, our framework enables high-degree-of-freedom spatiotemporal editingโ€”without requiring keyframes, additional annotations, or large language models. Extensive experiments on multiple text-guided motion editing benchmarks demonstrate state-of-the-art performance, with significant improvements in editing diversity, physical plausibility, and semantic fidelity.

Technology Category

Application Category

๐Ÿ“ Abstract
Text-guided motion editing enables high-level semantic control and iterative modifications beyond traditional keyframe animation. Existing methods rely on limited pre-collected training triplets, which severely hinders their versatility in diverse editing scenarios. We introduce MotionCutMix, an online data augmentation technique that dynamically generates training triplets by blending body part motions based on input text. While MotionCutMix effectively expands the training distribution, the compositional nature introduces increased randomness and potential body part incoordination. To model such a rich distribution, we present MotionReFit, an auto-regressive diffusion model with a motion coordinator. The auto-regressive architecture facilitates learning by decomposing long sequences, while the motion coordinator mitigates the artifacts of motion composition. Our method handles both spatial and temporal motion edits directly from high-level human instructions, without relying on additional specifications or Large Language Models. Through extensive experiments, we show that MotionReFit achieves state-of-the-art performance in text-guided motion editing.
Problem

Research questions and friction points this paper is trying to address.

Expands motion editing versatility beyond limited training triplets
Addresses incoordination in dynamically blended body part motions
Enables text-guided spatial and temporal edits without LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online data augmentation with MotionCutMix
Auto-regressive diffusion model MotionReFit
Motion coordinator mitigates composition artifacts
๐Ÿ”Ž Similar Papers
No similar papers found.
N
Nan Jiang
Institute for AI, Peking University; State Key Laboratory of General Artificial Intelligence, BIGAI; Yuanpei College, Peking University
Hongjie Li
Hongjie Li
Peking University
Computer Graphics
Z
Ziye Yuan
Institute for AI, Peking University
Zimo He
Zimo He
Ph.D. Student, Peking University
Computer VisionComputer GraphicsRobotics
Y
Yixin Chen
State Key Laboratory of General Artificial Intelligence, BIGAI
Tengyu Liu
Tengyu Liu
Beijing Institute for General Artificial Intelligence
computer visionhuman object interactionhuman motion generationgrasping
Yixin Zhu
Yixin Zhu
Assistant Professor, Peking University
Computer VisionVisual ReasoningHuman-Robot Teaming
S
Siyuan Huang
State Key Laboratory of General Artificial Intelligence, BIGAI