Average-Reward Reinforcement Learning with Entropy Regularization

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This paper addresses the issues of discounting-induced bias and insufficient policy robustness in long-horizon decision-making tasks by proposing the first systematic entropy-regularized average-reward reinforcement learning framework. Methodologically, it tightly integrates entropy regularization with the average-reward objective, yielding a scalable algorithm based on policy gradients and dual optimization—supporting neural function approximation and online policy updates. Theoretically and technically, it fills a critical gap at the intersection of average-reward RL and entropy regularization: it eliminates the inherent temporal bias of discounted MDPs while enhancing exploration stability and resilience to environmental perturbations. Empirical evaluation on standard RL benchmarks demonstrates that the proposed algorithm consistently outperforms existing average-reward and entropy-regularized baselines across three key metrics: convergence speed, asymptotic reward performance, and policy robustness.

Technology Category

Application Category

📝 Abstract

The average-reward formulation of reinforcement learning (RL) has drawn increased interest in recent years due to its ability to solve temporally-extended problems without discounting. Independently, RL algorithms have benefited from entropy-regularization: an approach used to make the optimal policy stochastic, thereby more robust to noise. Despite the distinct benefits of the two approaches, the combination of entropy regularization with an average-reward objective is not well-studied in the literature and there has been limited development of algorithms for this setting. To address this gap in the field, we develop algorithms for solving entropy-regularized average-reward RL problems with function approximation. We experimentally validate our method, comparing it with existing algorithms on standard benchmarks for RL.

Problem

Research questions and friction points this paper is trying to address.

Reinforcement Learning

Entropy Regularization

Long-term Planning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy Regularization

Average Reward Reinforcement Learning

Function Approximation

🔎 Similar Papers

No similar papers found.