Omni-Supervised Motion Editing: Balancing Change and Invariance through Positive-Negative Learning

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

201K/year
🤖 AI Summary
Text-driven human motion editing faces the dual challenge of precisely modifying target regions while preserving the coherence of unedited motion segments. This work proposes OmniME, a novel framework that establishes the first unified fully supervised learning paradigm for this task. OmniME integrates coarse-to-fine cross-layer consistency supervision, a similarity-based motion preservation mechanism, and triplet-based text-motion semantic alignment, further enhanced by retrospective feature supervision within a diffusion Transformer architecture. Evaluated on the MotionFix and STANCE Adjustment benchmarks, the method significantly outperforms existing approaches, achieving an optimal balance between editing accuracy and motion consistency.
📝 Abstract
Text-based human motion editing aims to modify existing motion sequences according to natural language instructions while maintaining the consistency of the original motion. Existing diffusion-based approaches often rely on heuristic similarity cues or coarse global conditioning, leading to motion distortion and suboptimal semantic alignment. The key challenge lies in balancing change (i.e. precisely editing target regions) and invariance (i.e. preserving unedited parts). To handle such challenge, we propose an Omni-Supervised Positive-Negative Learning framework, named OmniME. Our method integrates three complementary components: (1) retrospective feature supervision that enforces coarse-to-fine consistency across transformer layers,(2) motion preservation mechanism that focuses on subtle variations according to the source-target similarity, and (3) triplet-based semantic alignment that strengthens text-motion correspondence. Together, these components form a unified supervision paradigm that balances change and invariance. Extensive experiments on the MotionFix and STANCE Adjustment datasets demonstrate that OmniME achieves state-of-the-art performance in editing alignment, validating the effectiveness of our unified learning framework. Our source codes and models have been released at: https://github.com/rocket-ycyer/OmniME.git
Problem

Research questions and friction points this paper is trying to address.

text-based motion editing
change-invariance balance
motion consistency
semantic alignment
human motion editing
Innovation

Methods, ideas, or system contributions that make the work stand out.

positive-negative learning
motion editing
text-motion alignment
feature supervision
invariance preservation
🔎 Similar Papers
2024-03-19ACM Transactions on GraphicsCitations: 21
Z
Zhenwu Shi
Shanghai Institute of Artificial Intelligence for Education, East China Normal University, China
Jingyu Gong
Jingyu Gong
Shanghai Jiao Tong University
3D Computer Vision
P
Peiwei Wang
School of Computer Science and Technology, East China Normal University, China
X
Xingzan Wang
School of Computer Science and Technology, East China Normal University, China
Tianwen Qian
Tianwen Qian
East China Normal University
MultimediaVision and LanguageEmbodied AI
W
Wenxi Li
School of Statistics, East China Normal University, China
Y
Yuan Fang
School of Computer Science and Technology, East China Normal University, China
J
Jiao Xie
School of Statistics, East China Normal University, China
L
Lizhuang Ma
School of Computer Science, Shanghai Jiao Tong University, China
S
Shaohui Lin
School of Computer Science and Technology, East China Normal University, China