🤖 AI Summary
This work addresses the common challenge in text-to-motion generation where artifacts such as foot sliding arise due to the difficulty of simultaneously achieving semantic alignment and physical plausibility. To this end, we propose the Distortion-aware Motion Calibrator (DMC), a self-supervised, data-driven post-processing module that learns to recover physically plausible and semantically consistent motions without requiring complex physics-based modeling. DMC operates by leveraging synthetically distorted motions paired with their original textual descriptions, enabling plug-and-play integration with diverse text-to-motion models. Experiments demonstrate significant improvements: when applied to T2M and T2M-GPT, DMC reduces FID by 42.74% and 13.20%, respectively, while achieving state-of-the-art R-Precision. Furthermore, integration with MoMask yields a 33.0% reduction in interpenetration, substantially mitigating prevalent physical distortions.
📝 Abstract
Generating semantically aligned human motion from textual descriptions has made rapid progress, but ensuring both semantic and physical realism in motion remains a challenge. In this paper, we introduce the Distortion-aware Motion Calibrator (DMC), a post-hoc module that refines physically implausible motions (e.g., foot floating) while preserving semantic consistency with the original textual description. Rather than relying on complex physical modeling, we propose a self-supervised and data-driven approach, whereby DMC learns to obtain physically plausible motions when an intentionally distorted motion and the original textual descriptions are given as inputs. We evaluate DMC as a post-hoc module to improve motions obtained from various text-to-motion generation models and demonstrate its effectiveness in improving physical plausibility while enhancing semantic consistency. The experimental results show that DMC reduces FID score by 42.74% on T2M and 13.20% on T2M-GPT, while also achieving the highest R-Precision. When applied to high-quality models like MoMask, DMC improves the physical plausibility of motions by reducing penetration by 33.0% as well as adjusting floating artifacts closer to the ground-truth reference. These results highlight that DMC can serve as a promising post-hoc motion refinement framework for any kind of text-to-motion models by incorporating textual semantics and physical plausibility.