🤖 AI Summary
Addressing the challenges of poor policy transferability for humanoid robots navigating sparse foothold terrains and inefficient training due to sparse rewards, this paper proposes an end-to-end reinforcement learning framework. Methodologically, it introduces a polygonal foot-contact sampling reward mechanism, a dual-batch evaluation network architecture, and a two-stage progressive training paradigm—enabling efficient policy transfer from flat-ground pretraining to complex terrains. The system integrates onboard LiDAR-based real-time elevation mapping, terrain-aware observation encoding, and dynamic motion control. Experimentally, the approach significantly improves sample efficiency in simulation; on hardware, it achieves stable and precise gait execution on sparse footholds, with disturbance rejection success rate exceeding 92%. Notably, this work presents the first real-time closed-loop deployment of a learning-based controller on such challenging sparse-foothold terrains.
📝 Abstract
Traversing risky terrains with sparse footholds poses a significant challenge for humanoid robots, requiring precise foot placements and stable locomotion. Existing approaches designed for quadrupedal robots often fail to generalize to humanoid robots due to differences in foot geometry and unstable morphology, while learning-based approaches for humanoid locomotion still face great challenges on complex terrains due to sparse foothold reward signals and inefficient learning processes. To address these challenges, we introduce BeamDojo, a reinforcement learning (RL) framework designed for enabling agile humanoid locomotion on sparse footholds. BeamDojo begins by introducing a sampling-based foothold reward tailored for polygonal feet, along with a double critic to balancing the learning process between dense locomotion rewards and sparse foothold rewards. To encourage sufficient trail-and-error exploration, BeamDojo incorporates a two-stage RL approach: the first stage relaxes the terrain dynamics by training the humanoid on flat terrain while providing it with task terrain perceptive observations, and the second stage fine-tunes the policy on the actual task terrain. Moreover, we implement a onboard LiDAR-based elevation map to enable real-world deployment. Extensive simulation and real-world experiments demonstrate that BeamDojo achieves efficient learning in simulation and enables agile locomotion with precise foot placement on sparse footholds in the real world, maintaining a high success rate even under significant external disturbances.