MARCH: Model-Assisted Reinforcement Learning for the Perceptive Control of Humanoids over Sparse Footholds

📅 2026-06-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of achieving safe, accurate, and robust bipedal locomotion for humanoid robots on sparse foothold terrains. The authors propose a model-assisted reinforcement learning framework that first leverages a simplified dynamics model to generate safe reference trajectories. A privileged teacher policy is then trained using a reward function derived from a Control Lyapunov Function (CLF), ensuring stability and safety. Finally, knowledge from the teacher policy is distilled into a student policy that relies solely on visual inputs. By integrating the strengths of model-based and model-free reinforcement learning, the approach significantly improves sample efficiency, reduces curriculum learning complexity, and yields physically plausible, smooth gaits. Simulations and real-world experiments on the Unitree G1 platform demonstrate robust walking performance on sparse foothold terrains with lateral constraints, matching the performance of state-of-the-art model-free baselines.

📝 Abstract

Perceptive bipedal locomotion over sparse terrain remains a difficult challenge: model-based methods are precise but brittle to uncertainty, while model-free methods are robust but struggle to discover the precise, constrained motions required for safety-critical locomotion where small errors can cause catastrophic failures. We propose a model-assisted reinforcement learning (RL) framework that combines both perspectives in three steps: (1) generate a safe reference trajectory using simplified models; (2) train a privileged teacher policy guided by a control Lyapunov function (CLF) reward built around the safe reference trajectory; and (3) distill the teacher into a vision-based student policy. We show that this model-assistance procedure produces physically grounded locomotion, improving sample efficiency, reducing the need for a complex learning curriculum, and achieving smoother locomotion behavior alongside stepping stone performance comparable to model-free baselines. We validate our approach in simulation and demonstrate successful deployment on a Unitree G1 humanoid robot navigating sparse footholds with lateral constraints.

Problem

Research questions and friction points this paper is trying to address.

perceptive bipedal locomotion

sparse footholds

humanoid control

model-based vs model-free

safety-critical locomotion

Innovation

Methods, ideas, or system contributions that make the work stand out.

model-assisted reinforcement learning

control Lyapunov function

trajectory distillation