Traversing the Narrow Path: A Two-Stage Reinforcement Learning Framework for Humanoid Beam Walking

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Humanoid robots face significant challenges when walking on narrow beams (e.g., 0.2 m wide), including sparse contact events, high safety-criticality, and insufficient robustness of purely learned policies. This paper proposes a physics-informed two-stage reinforcement learning framework: Stage I generates interpretable and dynamically stable base gait templates using the eXtended Center of Mass (XCoM) and Linear Inverted Pendulum Model (LIPM); Stage II employs lightweight residual RL to adaptively refine gaits, leveraging low-perception-state estimation—fusing IMU, joint encoder, and forward height cues—and executing control via a low-level tracking controller. The method decouples robust trajectory tracking from fine-grained adaptation, thereby enhancing policy safety, interpretability, and sim-to-real transferability. Experiments on the Unitree G1 platform demonstrate superior beam-crossing success rate, trajectory accuracy, and safety margin compared to pure template-based or end-to-end approaches.

Technology Category

Application Category

📝 Abstract
Traversing narrow beams is challenging for humanoids due to sparse, safety-critical contacts and the fragility of purely learned policies. We propose a physically grounded, two-stage framework that couples an XCoM/LIPM footstep template with a lightweight residual planner and a simple low-level tracker. Stage-1 is trained on flat ground: the tracker learns to robustly follow footstep targets by adding small random perturbations to heuristic footsteps, without any hand-crafted centerline locking, so it acquires stable contact scheduling and strong target-tracking robustness. Stage-2 is trained in simulation on a beam: a high-level planner predicts a body-frame residual (Delta x, Delta y, Delta psi) for the swing foot only, refining the template step to prioritize safe, precise placement under narrow support while preserving interpretability. To ease deployment, sensing is kept minimal and consistent between simulation and hardware: the planner consumes compact, forward-facing elevation cues together with onboard IMU and joint signals. On a Unitree G1, our system reliably traverses a 0.2 m-wide, 3 m-long beam. Across simulation and real-world studies, residual refinement consistently outperforms template-only and monolithic baselines in success rate, centerline adherence, and safety margins, while the structured footstep interface enables transparent analysis and low-friction sim-to-real transfer.
Problem

Research questions and friction points this paper is trying to address.

Enabling humanoid robots to traverse narrow beams safely
Overcoming sparse contacts and fragile learned policies
Ensuring precise foot placement and robust stability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage reinforcement learning with residual planner
Minimal sensing using onboard IMU and elevation cues
Sim-to-real transfer with structured footstep interface
🔎 Similar Papers
No similar papers found.
T
TianChen Huang
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China
W
Wei Gao
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China
R
Runchen Xu
Institute of Humanoid Robots, Department of Precision Machinery and Precision Instrumentation, University of Science and Technology of China
Shiwu Zhang
Shiwu Zhang
University of Science and Technology of China
RoboticsSmart MaterialsTerradynamics