π€ AI Summary
Existing approaches struggle to efficiently transfer human motion styles to diverse locomotion content on humanoid robots while ensuring physical feasibility, often producing motions that are kinematically implausible or dynamically unstable. This work proposes a bio-inspired generative-to-control framework that, for the first time, enables reusable style transfer from short human demonstrations without retraining, allowing continuous adjustment of style intensity. The method integrates a physics-aware multi-conditional latent diffusion model with classifier-free guidance, contact consistency constraints, and temporal smoothing regularization. It further introduces a preview-based whole-body tracking scheme and a clustering-distillation training strategy. Evaluated on the Unitree G1 robot, the approach achieves a 96.0% execution success rate, significantly reducing contact penetration and jitter artifacts while enabling high-quality stylized execution across a wide range of motion content.
π Abstract
Expressive whole-body motion is important for humanoid robots operating in human environments, where robots are expected to move stably while presenting readable and adjustable body behaviors. However, most expressive motions are still obtained from fixed demonstrations or manually designed scripts, making it difficult to reuse a demonstrated style across different motion contents. Inspired by the way human motion styles convey affective and intentional cues through gait rhythm, posture, arm swing and body sway, this paper proposes a bionic generation-to-control framework for exemplar-driven style transfer on humanoid robots. Given a short human style exemplar and a target content motion, the proposed framework generates a stylized whole-body reference that preserves the intended motion content while transferring the demonstrated style. A physics-aware multi-condition latent diffusion model is developed to fuse style, content and trajectory conditions, and classifier-free guidance is used to adjust the style intensity without retraining. To improve hardware executability, contact-consistency and temporal-smoothness regularization are imposed on decoded motions during training. The generated references are then converted into G1-compatible robot references and executed by a preview-based whole-body tracking policy trained with a cluster-and-distill strategy. Simulation and Unitree G1 experiments show that the proposed method can transfer short human style exemplars to diverse robot motion contents, reduce contact and jitter artifacts compared with animation-oriented style-transfer baselines, and achieve a 96.0% success rate over 125 reported real-robot trials. The results demonstrate the feasibility of using short human motion exemplars as reusable bionic sources for physically executable expressive humanoid motion.