STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

📅 2025-02-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the bottleneck in humanoid robot gait control where deep reinforcement learning (DRL) reward function design relies heavily on manual tuning and domain expertise, this paper proposes the first end-to-end reward auto-generation framework integrating agent engineering with large language models (LLMs). The method requires no task-specific prompts or templates, enabling zero-shot reward generation and context-driven iterative refinement, while seamlessly interfacing with standard DRL algorithms such as PPO and SAC. Evaluated in multi-morphology humanoid robot simulation environments, it significantly outperforms EUREKA—achieving high-dynamic, sprint-level gaits on complex terrains, improving training efficiency by 37%, and boosting task performance by 29%. Its core contribution lies in the systematic integration of LLM capabilities—code generation, zero-shot reasoning, and contextual optimization—into reward engineering, thereby establishing a fully automated closed loop from reward specification to policy learning.

Technology Category

Application Category

📝 Abstract

Humanoid robotics presents significant challenges in artificial intelligence, requiring precise coordination and control of high-degree-of-freedom systems. Designing effective reward functions for deep reinforcement learning (DRL) in this domain remains a critical bottleneck, demanding extensive manual effort, domain expertise, and iterative refinement. To overcome these challenges, we introduce STRIDE, a novel framework built on agentic engineering to automate reward design, DRL training, and feedback optimization for humanoid robot locomotion tasks. By combining the structured principles of agentic engineering with large language models (LLMs) for code-writing, zero-shot generation, and in-context optimization, STRIDE generates, evaluates, and iteratively refines reward functions without relying on task-specific prompts or templates. Across diverse environments featuring humanoid robot morphologies, STRIDE outperforms the state-of-the-art reward design framework EUREKA, achieving significant improvements in efficiency and task performance. Using STRIDE-generated rewards, simulated humanoid robots achieve sprint-level locomotion across complex terrains, highlighting its ability to advance DRL workflows and humanoid robotics research.

Problem

Research questions and friction points this paper is trying to address.

Automates reward design in humanoid robotics

Enhances deep reinforcement learning training efficiency

Optimizes feedback for complex locomotion tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates reward design in DRL

Integrates agentic engineering and LLMs

Enhances humanoid robot locomotion efficiency

🔎 Similar Papers

No similar papers found.

Authors to Follow