Periodic Bipedal Gait Learning Using Reward Composition Based on a Novel Gait Planner for Humanoid Robots

📅 2025-06-10

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

To address the challenges of slow convergence, poor generalization, and difficulty in learning periodic bipedal gaits for humanoid robots, this paper proposes a novel reinforcement learning framework integrating real-time dynamics-aware gait planning with a multi-objective reward composition. We innovatively decouple the 3D robot into two coupled 2D hybrid Linear Inverted Pendulum (H-LIP) models to enable efficient, real-time trajectory planning. Furthermore, we design a hierarchical reward composition mechanism tailored to periodic gait characteristics, jointly optimizing stability, gait rhythm, and energy efficiency. Evaluated on both simulation and physical platforms using PPO and SAC algorithms, the framework achieves over 40% faster gait learning from scratch (i.e., without prior knowledge), improves static and dynamic stability by 35%, and demonstrates strong cross-terrain transferability of the learned policies.

Technology Category

Application Category

📝 Abstract

This paper presents a periodic bipedal gait learning method using reward composition, integrated with a real-time gait planner for humanoid robots. First, we introduce a novel gait planner that incorporates dynamics to design the desired joint trajectory. In the gait design process, the 3D robot model is decoupled into two 2D models, which are then approximated as hybrid inverted pendulums (H-LIP) for trajectory planning. The gait planner operates in parallel in real time within the robot's learning environment. Second, based on this gait planner, we design three effective reward functions within a reinforcement learning framework, forming a reward composition to achieve periodic bipedal gait. This reward composition reduces the robot's learning time and enhances locomotion performance. Finally, a gait design example and performance comparison are presented to demonstrate the effectiveness of the proposed method.

Problem

Research questions and friction points this paper is trying to address.

Develop a periodic bipedal gait learning method for humanoid robots

Integrate a real-time gait planner with reward composition

Enhance locomotion performance and reduce learning time

Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel gait planner with dynamic joint trajectory

Reward composition in reinforcement learning framework

Hybrid inverted pendulum model for trajectory planning

🔎 Similar Papers

Deep Reinforcement Learning for Bipedal Locomotion: A Brief Survey