M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
Existing whole-body controllers for humanoid robots struggle to jointly handle heterogeneous motion modalities—such as joint trajectories, human poses, and end-effector poses—due to representation mismatches. This work proposes M3imic, a unified control framework that, for the first time, integrates these three modalities within a single architecture. Each modality is mapped into a shared latent space via modality-specific encoders, and a single policy is trained using large-scale simulation-based reinforcement learning, enabling cross-modal generalization without retraining. Evaluated on the Unitree G1 platform, the method achieves a 98.42% success rate on unseen test sequences in simulation and demonstrates successful sim-to-real transfer of multimodal motion skills.
📝 Abstract
Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector trajectory tracking. Existing methods often overlook the representational mismatch between dense robot joint angles and sparse end-effector poses. To address this, we propose Multi-Modal Mimic (M3imic), a versatile multi-modal whole-body control framework that unifies heterogeneous motion reference modalities, including robot joint angles, human pose trajectories, and end-effector poses, using modality-specific encoders to map them into a shared latent space. Leveraging large-scale reinforcement learning in the simulator, we train a single policy that achieves sim-to-real transfer across multiple motion reference modalities without modality-specific retraining. Extensive simulation and real-world experiments on the Unitree G1 robot are conducted to evaluate the proposed framework. In simulation, the policy achieves a peak success rate of 98.42\% on an unseen test dataset, demonstrating its exceptional generalization capability. The code is available at https://github.com/Renforce-Dynamics/MultiModalWBC
Problem

Research questions and friction points this paper is trying to address.

whole-body control
motion mimicry
multimodal representation
humanoid robots
sim-to-real transfer
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-modal motion mimicry
whole-body control
shared latent space
sim-to-real transfer
reinforcement learning
🔎 Similar Papers
Z
Zuxing Lu
School of Automation, Southeast University, Nanjing 210096, China
Ziang Zheng
Ziang Zheng
Master, Tsinghua University
Reinforcement LearningRoboticsAnimationFederated Learning
Yao Lyu
Yao Lyu
Postdoctor, Tsinghua University
autonomous drivingembodied AIreinforcement learning
Jingyu Liu
Jingyu Liu
AIMC Lab, School of Information, Renmin University of China
Video EditingOnline Handwriting AnalysisSketch Analysis
F
Feihong Zhang
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
S
Song Lu
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China
X
Xin Yuan
School of Automation, Southeast University, Nanjing 210096, China
C
Changyin Sun
School of Automation, Southeast University, Nanjing 210096, China
Xingxing Zuo
Xingxing Zuo
Assistant Professor @MBZUAI
RoboticsState EstimationEmbodied AI
S
Shengbo Eben Li
School of Vehicle and Mobility, Tsinghua University, Beijing 100084, China