Rationality Measurement and Theory for Reinforcement Learning Agents

📅 2026-02-04

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the lack of a quantitative framework for evaluating rationality in reinforcement learning agents and measuring deviations from optimal rational behavior. The authors introduce the concept of “rational risk,” defined as the value discrepancy between an agent’s policy actions and ideal rational actions. They formulate the expected rational risk during deployment and its empirical estimate during training, establishing the first formal metric system for rationality in reinforcement learning. Theoretical analysis decomposes the rational risk gap into an extrinsic component—induced by environment shifts and characterized by the 1-Wasserstein distance—and an intrinsic component—stemming from algorithmic generalization and bounded by empirical Rademacher complexity—with accompanying error bounds. Experiments confirm that layer normalization, ℓ² regularization, and weight normalization enhance rationality, while environmental shifts degrade it, aligning closely with theoretical predictions.

Technology Category

Application Category

📝 Abstract

This paper proposes a suite of rationality measures and associated theory for reinforcement learning agents, a property increasingly critical yet rarely explored. We define an action in deployment to be perfectly rational if it maximises the hidden true value function in the steepest direction. The expected value discrepancy of a policy's actions against their rational counterparts, culminating over the trajectory in deployment, is defined to be expected rational risk; an empirical average version in training is also defined. Their difference, termed as rational risk gap, is decomposed into (1) an extrinsic component caused by environment shifts between training and deployment, and (2) an intrinsic one due to the algorithm's generalisability in a dynamic environment. They are upper bounded by, respectively, (1) the $1$-Wasserstein distance between transition kernels and initial state distributions in training and deployment, and (2) the empirical Rademacher complexity of the value function class. Our theory suggests hypotheses on the benefits from regularisers (including layer normalisation, $\ell_2$ regularisation, and weight normalisation) and domain randomisation, as well as the harm from environment shifts. Experiments are in full agreement with these hypotheses. The code is available at https://github.com/EVIEHub/Rationality.

Problem

Research questions and friction points this paper is trying to address.

rationality

reinforcement learning

value discrepancy

environment shift

generalisation

Innovation

Methods, ideas, or system contributions that make the work stand out.

rationality measurement

reinforcement learning

rational risk gap