GRACE: A Language Model Framework for Explainable Inverse Reinforcement Learning

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

Traditional inverse reinforcement learning (IRL) typically yields reward functions as opaque, uninterpretable black-box models, hindering debugging and formal verification. To address this, we propose GRACE—a novel framework that synergistically integrates large language models (LLMs) with evolutionary search for interpretable reward engineering: LLMs generate candidate code-based reward functions, while evolutionary search optimizes their fit to expert demonstrations. The resulting rewards are human-readable, executable, and amenable to formal verification, supporting complex multi-task settings. Evaluated on BabyAI and AndroidWorld benchmarks, GRACE-recovered reward functions significantly outperform imitation learning and online RL baselines, improving policy performance by 12–28%. Moreover, GRACE enables unified modeling of multi-task rewards and facilitates the construction of reusable reward APIs—marking the first approach to jointly leverage LLMs and evolutionary optimization for verifiable, interpretable reward synthesis in IRL.

Technology Category

Application Category

📝 Abstract

Inverse Reinforcement Learning aims to recover reward models from expert demonstrations, but traditional methods yield "black-box" models that are difficult to interpret and debug. In this work, we introduce GRACE (Generating Rewards As CodE), a method for using Large Language Models within an evolutionary search to reverse-engineer an interpretable, code-based reward function directly from expert trajectories. The resulting reward function is executable code that can be inspected and verified. We empirically validate GRACE on the BabyAI and AndroidWorld benchmarks, where it efficiently learns highly accurate rewards, even in complex, multi-task settings. Further, we demonstrate that the resulting reward leads to strong policies, compared to both competitive Imitation Learning and online RL approaches with ground-truth rewards. Finally, we show that GRACE is able to build complex reward APIs in multi-task setups.

Problem

Research questions and friction points this paper is trying to address.

Develop interpretable reward functions from expert demonstrations

Reverse-engineer code-based rewards using evolutionary search

Address black-box limitations in Inverse Reinforcement Learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLMs in evolutionary search

Generates interpretable code-based rewards

Learns accurate rewards from expert trajectories

🔎 Similar Papers

Natural Language Reinforcement Learning