A Self-Supervised Approach on Motion Calibration for Enhancing Physical Plausibility in Text-to-Motion

📅 2026-02-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the common challenge in text-to-motion generation where artifacts such as foot sliding arise due to the difficulty of simultaneously achieving semantic alignment and physical plausibility. To this end, we propose the Distortion-aware Motion Calibrator (DMC), a self-supervised, data-driven post-processing module that learns to recover physically plausible and semantically consistent motions without requiring complex physics-based modeling. DMC operates by leveraging synthetically distorted motions paired with their original textual descriptions, enabling plug-and-play integration with diverse text-to-motion models. Experiments demonstrate significant improvements: when applied to T2M and T2M-GPT, DMC reduces FID by 42.74% and 13.20%, respectively, while achieving state-of-the-art R-Precision. Furthermore, integration with MoMask yields a 33.0% reduction in interpenetration, substantially mitigating prevalent physical distortions.

Technology Category

Application Category

📝 Abstract
Generating semantically aligned human motion from textual descriptions has made rapid progress, but ensuring both semantic and physical realism in motion remains a challenge. In this paper, we introduce the Distortion-aware Motion Calibrator (DMC), a post-hoc module that refines physically implausible motions (e.g., foot floating) while preserving semantic consistency with the original textual description. Rather than relying on complex physical modeling, we propose a self-supervised and data-driven approach, whereby DMC learns to obtain physically plausible motions when an intentionally distorted motion and the original textual descriptions are given as inputs. We evaluate DMC as a post-hoc module to improve motions obtained from various text-to-motion generation models and demonstrate its effectiveness in improving physical plausibility while enhancing semantic consistency. The experimental results show that DMC reduces FID score by 42.74% on T2M and 13.20% on T2M-GPT, while also achieving the highest R-Precision. When applied to high-quality models like MoMask, DMC improves the physical plausibility of motions by reducing penetration by 33.0% as well as adjusting floating artifacts closer to the ground-truth reference. These results highlight that DMC can serve as a promising post-hoc motion refinement framework for any kind of text-to-motion models by incorporating textual semantics and physical plausibility.
Problem

Research questions and friction points this paper is trying to address.

text-to-motion
physical plausibility
motion calibration
semantic consistency
human motion generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

self-supervised learning
motion calibration
physical plausibility
text-to-motion
post-hoc refinement
🔎 Similar Papers
No similar papers found.
G
Gahyeon Shim
Artificial Intelligence Graduate School (AIGS), Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
S
Soogeun Park
Artificial Intelligence Graduate School (AIGS), Ulsan National Institute of Science and Technology (UNIST), Ulsan, Korea
Hyemin Ahn
Hyemin Ahn
POSTECH
RoboticsHuman Robot Interaction