M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Existing whole-body controllers for humanoid robots struggle to jointly handle heterogeneous motion modalities—such as joint trajectories, human poses, and end-effector poses—due to representation mismatches. This work proposes M3imic, a unified control framework that, for the first time, integrates these three modalities within a single architecture. Each modality is mapped into a shared latent space via modality-specific encoders, and a single policy is trained using large-scale simulation-based reinforcement learning, enabling cross-modal generalization without retraining. Evaluated on the Unitree G1 platform, the method achieves a 98.42% success rate on unseen test sequences in simulation and demonstrates successful sim-to-real transfer of multimodal motion skills.

📝 Abstract

Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector trajectory tracking. Existing methods often overlook the representational mismatch between dense robot joint angles and sparse end-effector poses. To address this, we propose Multi-Modal Mimic (M3imic), a versatile multi-modal whole-body control framework that unifies heterogeneous motion reference modalities, including robot joint angles, human pose trajectories, and end-effector poses, using modality-specific encoders to map them into a shared latent space. Leveraging large-scale reinforcement learning in the simulator, we train a single policy that achieves sim-to-real transfer across multiple motion reference modalities without modality-specific retraining. Extensive simulation and real-world experiments on the Unitree G1 robot are conducted to evaluate the proposed framework. In simulation, the policy achieves a peak success rate of 98.42\% on an unseen test dataset, demonstrating its exceptional generalization capability. The code is available at https://github.com/Renforce-Dynamics/MultiModalWBC

Problem

Research questions and friction points this paper is trying to address.

whole-body control

motion mimicry

multimodal representation

humanoid robots

sim-to-real transfer

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal motion mimicry

whole-body control

shared latent space