STRIDE: Automating Reward Design, Deep Reinforcement Learning Training and Feedback Optimization in Humanoid Robotics Locomotion

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottleneck in humanoid robot gait control where deep reinforcement learning (DRL) reward function design relies heavily on manual tuning and domain expertise, this paper proposes the first end-to-end reward auto-generation framework integrating agent engineering with large language models (LLMs). The method requires no task-specific prompts or templates, enabling zero-shot reward generation and context-driven iterative refinement, while seamlessly interfacing with standard DRL algorithms such as PPO and SAC. Evaluated in multi-morphology humanoid robot simulation environments, it significantly outperforms EUREKA—achieving high-dynamic, sprint-level gaits on complex terrains, improving training efficiency by 37%, and boosting task performance by 29%. Its core contribution lies in the systematic integration of LLM capabilities—code generation, zero-shot reasoning, and contextual optimization—into reward engineering, thereby establishing a fully automated closed loop from reward specification to policy learning.

Technology Category

Application Category

📝 Abstract
Humanoid robotics presents significant challenges in artificial intelligence, requiring precise coordination and control of high-degree-of-freedom systems. Designing effective reward functions for deep reinforcement learning (DRL) in this domain remains a critical bottleneck, demanding extensive manual effort, domain expertise, and iterative refinement. To overcome these challenges, we introduce STRIDE, a novel framework built on agentic engineering to automate reward design, DRL training, and feedback optimization for humanoid robot locomotion tasks. By combining the structured principles of agentic engineering with large language models (LLMs) for code-writing, zero-shot generation, and in-context optimization, STRIDE generates, evaluates, and iteratively refines reward functions without relying on task-specific prompts or templates. Across diverse environments featuring humanoid robot morphologies, STRIDE outperforms the state-of-the-art reward design framework EUREKA, achieving significant improvements in efficiency and task performance. Using STRIDE-generated rewards, simulated humanoid robots achieve sprint-level locomotion across complex terrains, highlighting its ability to advance DRL workflows and humanoid robotics research.
Problem

Research questions and friction points this paper is trying to address.

Automates reward design in humanoid robotics
Enhances deep reinforcement learning training efficiency
Optimizes feedback for complex locomotion tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automates reward design in DRL
Integrates agentic engineering and LLMs
Enhances humanoid robot locomotion efficiency
🔎 Similar Papers
No similar papers found.
Z
Zhenwei Wu
Zhicheng AI, Hangzhou, China; School of Automation Science and Engineering, South China University of China, Guangzhou, China
Jinxiong Lu
Jinxiong Lu
Zhicheng AI, Hangzhou, China
Y
Yuxiao Chen
Zhicheng AI, Hangzhou, China
Yunxin Liu
Yunxin Liu
IEEE Fellow, Guoqiang Professor, Institute for AI Industry Research (AIR), Tsinghua University
Mobile ComputingEdge ComputingAIoTSystemNetworking
Y
Yueting Zhuang
College of Computer Science and Technology, Zhejiang University, Hangzhou, China
Luhui Hu
Luhui Hu
Aurorain AI, ex-Meta, Microsoft, Amazon
AI engineeringdata cloudfoundation models