RDA: Reward Design Agent for Reinforcement Learning

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

This work addresses the challenge of aligning reward functions with human intent in reinforcement learning, where handcrafted rewards often fail to capture nuanced objectives and existing automated approaches suffer from semantic misalignment due to insufficient feedback. To overcome this, the paper introduces a closed-loop reward optimization framework grounded in vision-language models (VLMs), which for the first time integrates visual semantic understanding into automatic reward generation. The framework iteratively refines reward functions through task decomposition, visual evaluation of agent trajectories, induction of failure modes, and code-level reward updates, thereby achieving high alignment between learned policies and natural language instructions. Experiments across 16 tasks in ManiSkill and HumanoidBench demonstrate that the proposed method significantly improves policy-instruction consistency while maintaining task success rates comparable to strong baselines.

📝 Abstract

Reinforcement learning has enabled the acquisition of impressive robotic skills, but typically requires hand-crafted reward functions that are slow to design and difficult to align with human intentions. Recent work, such as Eureka, automates reward design by using an LLM to iteratively generate and refine reward code from task descriptions. However, they rely on coarse feedback signals such as success rate, which provide little semantic insight into the learned behavior. As a result, their trained policies achieve the final goal but are frequently poorly aligned with task instructions. We introduce the Reward Design Agent (RDA), a VLM-based agentic framework that injects semantic understanding into reward design. RDA decomposes tasks, visually evaluates trajectories, summarizes failure modes, and iteratively revises reward code to better align with task instructions. Across 12 tabletop manipulation tasks from ManiSkill and 4 whole-body manipulation tasks from HumanoidBench, RDA produces policies substantially more instruction-aligned than those of other baselines, while achieving comparable task success rates. Videos and the generated reward code are available on https://nitinkamra1992.github.io/reward-design-agent.

Problem

Research questions and friction points this paper is trying to address.

reward design

reinforcement learning

instruction alignment

semantic understanding

robotic manipulation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reward Design

Vision-Language Model

Instruction Alignment