๐ค AI Summary
Existing generative methods often neglect physical constraints, resulting in artifacts such as foot sliding and floating, and lack effective physics-based optimization for noisy motion data. This paper proposes a two-stage collaborative optimization framework that operates without ground-truth motion-capture data: first synthesizing large-scale diverse motion sequences, then refining them via physics-based imitation using MuJoCo or PyBullet, with gradient-based backward optimization of the generator. We introduce the first โmotion-agnosticโ physical optimization paradigm, establishing a closed-loop joint fine-tuning mechanism between generator and refiner, augmented by an adversarial physics projection loss. Evaluated on text- and music-driven dance generation, our method achieves state-of-the-art performance: foot sliding is reduced by 72%, floating frames decrease by 89%, and both physical plausibility and motion quality are significantly enhanced.
๐ Abstract
Human motion generation plays a vital role in applications such as digital humans and humanoid robot control. However, most existing approaches disregard physics constraints, leading to the frequent production of physically implausible motions with pronounced artifacts such as floating and foot sliding. In this paper, we propose extbf{Morph}, a extbf{Mo}tion-f extbf{r}ee extbf{ph}ysics optimization framework, comprising a Motion Generator and a Motion Physics Refinement module, for enhancing physical plausibility without relying on costly real-world motion data. Specifically, the Motion Generator is responsible for providing large-scale synthetic motion data, while the Motion Physics Refinement Module utilizes these synthetic data to train a motion imitator within a physics simulator, enforcing physical constraints to project the noisy motions into a physically-plausible space. These physically refined motions, in turn, are used to fine-tune the Motion Generator, further enhancing its capability. Experiments on both text-to-motion and music-to-dance generation tasks demonstrate that our framework achieves state-of-the-art motion generation quality while improving physical plausibility drastically.