Global-Local Attention Decomposition for Terrain Encoding in Humanoid Perceptive Locomotion

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

225K/year

🤖 AI Summary

Humanoid robots struggle to simultaneously achieve global terrain understanding and precise local foothold selection in environments with sparse footholds and spatial constraints. This work proposes the Global-Local Attention Decomposition (GLAD) method, which explicitly decouples these two aspects for the first time. Leveraging an ego-centric elevation map, GLAD employs an attention-pooling-based global branch to capture terrain context and a state-conditioned local attention branch to extract critical geometric details. This coarse-to-fine architecture prevents fine-grained cues from being diluted, reduces training overhead, and enables terrain-responsive behaviors without explicit planning. The approach demonstrates robust performance across challenging tasks—including gap crossing, stepping stones, and stair traversal—and achieves zero-shot sim-to-real transfer on the real Unitree G1 robot, enabling autonomous obstacle avoidance and narrow-path navigation.

📝 Abstract

Although reinforcement learning has significantly advanced humanoid locomotion, perceptive policies still struggle on sparse-foothold terrain and constrained environments. Success in these scenarios requires both broad terrain awareness and precise foothold selection, two perceptual roles that conventional encoders often entangle. To address this challenge, we propose Global-Local Attention Decomposition (GLAD) for terrain encoding in humanoid locomotion. Realized by a coarse-to-fine encoder over a robot-centric elevation map, GLAD explicitly separates these objectives: a global attention branch utilizes attention pooling to summarize the surrounding terrain context, while a state-conditioned local attention branch sparsifies and encodes precise foothold-relevant geometry. This explicit attention decomposition prevents the dilution of fine-grained spatial cues while reducing training overhead. Experiments demonstrate that GLAD enables reliable locomotion over challenging gaps, stepping stones, and stairs. Furthermore, the learned policy exhibits emergent terrain-responsive behaviors, autonomously following narrow paths and avoiding obstacles under simple velocity commands without explicit navigation planners. In real-world deployment on a Unitree G1 humanoid robot using onboard LiDAR, the proposed method achieves robust zero-shot sim-to-real transfer across diverse sparse-foothold and obstacle-rich domains.

Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion

sparse-foothold terrain

perceptive policy

terrain encoding

attention decomposition

Innovation

Methods, ideas, or system contributions that make the work stand out.

Global-Local Attention Decomposition

terrain encoding

humanoid locomotion