🤖 AI Summary
To address the challenges of slow convergence, poor generalization, and difficulty in learning periodic bipedal gaits for humanoid robots, this paper proposes a novel reinforcement learning framework integrating real-time dynamics-aware gait planning with a multi-objective reward composition. We innovatively decouple the 3D robot into two coupled 2D hybrid Linear Inverted Pendulum (H-LIP) models to enable efficient, real-time trajectory planning. Furthermore, we design a hierarchical reward composition mechanism tailored to periodic gait characteristics, jointly optimizing stability, gait rhythm, and energy efficiency. Evaluated on both simulation and physical platforms using PPO and SAC algorithms, the framework achieves over 40% faster gait learning from scratch (i.e., without prior knowledge), improves static and dynamic stability by 35%, and demonstrates strong cross-terrain transferability of the learned policies.
📝 Abstract
This paper presents a periodic bipedal gait learning method using reward composition, integrated with a real-time gait planner for humanoid robots. First, we introduce a novel gait planner that incorporates dynamics to design the desired joint trajectory. In the gait design process, the 3D robot model is decoupled into two 2D models, which are then approximated as hybrid inverted pendulums (H-LIP) for trajectory planning. The gait planner operates in parallel in real time within the robot's learning environment. Second, based on this gait planner, we design three effective reward functions within a reinforcement learning framework, forming a reward composition to achieve periodic bipedal gait. This reward composition reduces the robot's learning time and enhances locomotion performance. Finally, a gait design example and performance comparison are presented to demonstrate the effectiveness of the proposed method.