TAGA: Terrain-aware Active Gaze Learning for Generalizable Agile Humanoid Locomotion

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

263K/year

🤖 AI Summary

This work addresses the challenge of simultaneously achieving wide-field perception and fine-grained geometric accuracy for humanoid robots navigating complex terrains. Inspired by human gaze behavior, we propose an active foveated learning framework that integrates visual input, proprioception, and motor commands. Leveraging an attention mechanism, our approach enables a policy network to adaptively focus on salient regions of elevation maps without requiring any supervisory signals, thereby realizing human-like active foveation purely through reinforcement learning for the first time. The proposed architecture substantially enhances perceptual information density, training efficiency, and terrain adaptability. Robust generalization is demonstrated both in simulation and on a physical robot, successfully accomplishing tasks such as crossing a 1.2-meter gap, traversing sparse footholds, and maintaining stable locomotion under strong external disturbances.

📝 Abstract

Agile humanoid locomotion across diverse challenging terrain demands both wide perceptual coverage and precise local geometry understanding. Motivated by the way humans selectively look at relevant terrain during locomotion, we introduce TAGA, a Terrain-aware Active Gaze learning framework for Attention-based humanoid control. By fusing vision, proprioception, and motion commands, our framework guides the model to learn anticipatory cues and actively attend to specific areas of the height scan, selectively using these informative regions for the downstream network. This adaptively increases the information density of observations under tight onboard computational constraints, thus enabling fine-grained perceptive locomotion over larger-scale terrains. We find that such gaze behaviors can naturally emerge through reinforcement learning alone, without requiring additional supervision or explicit guidance, significantly improve training efficiency. As a result, the trained policy demonstrates robust and generalizable locomotion in simulation and on hardware, including reliable terrain-aware foothold selection, elevated-platform traversal, competitive sparse-foothold traversal, and the largest reported real-world gap traversal distance of 1.2m among perceptive humanoid locomotion systems, while maintaining stability under severe perceptual disturbances and environmental interference.

Problem

Research questions and friction points this paper is trying to address.

humanoid locomotion

terrain-aware perception

active gaze

generalizable locomotion

challenging terrain

Innovation

Methods, ideas, or system contributions that make the work stand out.

active gaze learning

terrain-aware locomotion

humanoid control