🤖 AI Summary
This work addresses the lack of scalable and reproducible simulation benchmarks for whole-body mobile manipulation in humanoid robots. We present the first unified simulation platform that supports large-scale tasks, diverse scenes, and extensive object assets, integrating MuJoCo’s high-fidelity dynamics with IsaacSim’s ray-traced rendering. The platform incorporates automated trajectory generation and low-latency VR-based teleoperation for real-world data collection. It enables, for the first time, visually realistic and contact-rich simulation of humanoid mobile manipulation, facilitating consistent evaluation of multiple state-of-the-art control policies. Experiments across 60 tasks and 50 scenes demonstrate strong sim-to-real correlation, with policies trained solely in simulation achieving zero-shot transfer to physical robots.
📝 Abstract
Humanoid foundation models are advancing faster than we can evaluate them. While real-world testing is expensive and difficult to reproduce, existing simulation benchmarks focus primarily on table-top or wheeled robots. A scalable and reproducible benchmark for whole-body humanoid loco-manipulation remains an open problem. To this end, we present SIMPLE, a unified simulation testbed for humanoid policy learning and evaluation. SIMPLE couples the accurate contact-rich dynamics of MuJoCo with the photorealistic rendering of IsaacSim. It provides a large-scale environment comprising 60 diverse whole-body tasks, 50 indoor scenes, and over 1,000 object assets. To facilitate scalable data collection, the framework integrates two data generation pipelines: automated trajectory generation via motion planning and a low-latency VR teleoperation interface. We further integrate and benchmark mainstream humanoid policies at scale in SIMPLE, including lightweight imitation networks, large vision-language-action (VLA) models, and recent world action models (WAMs). Our experiments reveal a strong correlation between policy performance in simulation and the real world. Furthermore, we demonstrate that policies trained on data collected in SIMPLE can be transferred zero-shot to physical humanoid robots under similar settings, providing a robust and reproducible foundation for humanoid robotics research.