Omni-Supervised Motion Editing: Balancing Change and Invariance through Positive-Negative Learning

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Text-driven human motion editing faces the dual challenge of precisely modifying target regions while preserving the coherence of unedited motion segments. This work proposes OmniME, a novel framework that establishes the first unified fully supervised learning paradigm for this task. OmniME integrates coarse-to-fine cross-layer consistency supervision, a similarity-based motion preservation mechanism, and triplet-based text-motion semantic alignment, further enhanced by retrospective feature supervision within a diffusion Transformer architecture. Evaluated on the MotionFix and STANCE Adjustment benchmarks, the method significantly outperforms existing approaches, achieving an optimal balance between editing accuracy and motion consistency.

📝 Abstract

Text-based human motion editing aims to modify existing motion sequences according to natural language instructions while maintaining the consistency of the original motion. Existing diffusion-based approaches often rely on heuristic similarity cues or coarse global conditioning, leading to motion distortion and suboptimal semantic alignment. The key challenge lies in balancing change (i.e. precisely editing target regions) and invariance (i.e. preserving unedited parts). To handle such challenge, we propose an Omni-Supervised Positive-Negative Learning framework, named OmniME. Our method integrates three complementary components: (1) retrospective feature supervision that enforces coarse-to-fine consistency across transformer layers,(2) motion preservation mechanism that focuses on subtle variations according to the source-target similarity, and (3) triplet-based semantic alignment that strengthens text-motion correspondence. Together, these components form a unified supervision paradigm that balances change and invariance. Extensive experiments on the MotionFix and STANCE Adjustment datasets demonstrate that OmniME achieves state-of-the-art performance in editing alignment, validating the effectiveness of our unified learning framework. Our source codes and models have been released at: https://github.com/rocket-ycyer/OmniME.git

Problem

Research questions and friction points this paper is trying to address.

text-based motion editing

change-invariance balance

motion consistency

semantic alignment

human motion editing

Innovation

Methods, ideas, or system contributions that make the work stand out.

positive-negative learning

motion editing

text-motion alignment