🤖 AI Summary
Current full-body teleoperation systems suffer from two critical bottlenecks: (1) decoupled upper- and lower-limb control, severely degrading interlimb coordination; and (2) open-loop operation, causing cumulative global pose drift over time. To address these, we propose a novel closed-loop, full-body haptic teleoperation paradigm leveraging head-and-hand mixed-reality (MR) tracking and a Mixture-of-Experts (MoE)-driven control architecture. Our method employs 6-DOF head-mounted display and hand-tracking sensors for master input, integrated with real-time whole-body kinematic optimization, online error compensation, and closed-loop feedback control—enabling drift-free global localization and skill-coordinated motion using head-and-hand inputs only. Experiments on complex tasks such as “ground object retrieval” demonstrate a 92% reduction in positional drift and a threefold increase in single-session duration. This work establishes the first demonstration of high-fidelity, long-duration, naturally coordinated teleoperation for humanoid robots.
📝 Abstract
Humanoid teleoperation plays a vital role in demonstrating and collecting data for complex humanoid-scene interactions. However, current teleoperation systems face critical limitations: they decouple upper- and lower-body control to maintain stability, restricting natural coordination, and operate open-loop without real-time position feedback, leading to accumulated drift. The fundamental challenge is achieving precise, coordinated whole-body teleoperation over extended durations while maintaining accurate global positioning. Here we show that an MoE-based teleoperation system, CLONE, with closed-loop error correction enables unprecedented whole-body teleoperation fidelity, maintaining minimal positional drift over long-range trajectories using only head and hand tracking from an MR headset. Unlike previous methods that either sacrifice coordination for stability or suffer from unbounded drift, CLONE learns diverse motion skills while preventing tracking error accumulation through real-time feedback, enabling complex coordinated movements such as ``picking up objects from the ground.'' These results establish a new milestone for whole-body humanoid teleoperation for long-horizon humanoid-scene interaction tasks.