🤖 AI Summary
Traditional inverse reinforcement learning (IRL) typically yields reward functions as opaque, uninterpretable black-box models, hindering debugging and formal verification. To address this, we propose GRACE—a novel framework that synergistically integrates large language models (LLMs) with evolutionary search for interpretable reward engineering: LLMs generate candidate code-based reward functions, while evolutionary search optimizes their fit to expert demonstrations. The resulting rewards are human-readable, executable, and amenable to formal verification, supporting complex multi-task settings. Evaluated on BabyAI and AndroidWorld benchmarks, GRACE-recovered reward functions significantly outperform imitation learning and online RL baselines, improving policy performance by 12–28%. Moreover, GRACE enables unified modeling of multi-task rewards and facilitates the construction of reusable reward APIs—marking the first approach to jointly leverage LLMs and evolutionary optimization for verifiable, interpretable reward synthesis in IRL.
📝 Abstract
Inverse Reinforcement Learning aims to recover reward models from expert demonstrations, but traditional methods yield "black-box" models that are difficult to interpret and debug. In this work, we introduce GRACE (Generating Rewards As CodE), a method for using Large Language Models within an evolutionary search to reverse-engineer an interpretable, code-based reward function directly from expert trajectories. The resulting reward function is executable code that can be inspected and verified. We empirically validate GRACE on the BabyAI and AndroidWorld benchmarks, where it efficiently learns highly accurate rewards, even in complex, multi-task settings. Further, we demonstrate that the resulting reward leads to strong policies, compared to both competitive Imitation Learning and online RL approaches with ground-truth rewards. Finally, we show that GRACE is able to build complex reward APIs in multi-task setups.