Bellman Error Centering

📅 2025-02-05

📈 Citations: 0

✨ Influential: 0

career value

204K/year

🤖 AI Summary

This paper addresses the instability of value function estimation in reinforcement learning by proposing the Bellman Error Centering (BEC) unified framework, which reveals that Variance-Reduced Critics (VRC) are inherently instances of BEC. It systematically resolves the challenge of constructing centered fixed points for both tabular and linear value function approximation. We introduce, for the first time, Centered TD (CTD) and Centered TD Correction (CTDC) algorithms—both equipped with rigorous theoretical convergence guarantees—and prove their convergence under both on-policy and off-policy settings. The BEC paradigm generalizes to a broad class of temporal-difference algorithms, significantly enhancing training stability and robustness. Empirical evaluations demonstrate that our methods consistently outperform conventional reward centering across diverse tasks and policy distributions, delivering uniform performance gains without task-specific tuning.

Technology Category

Application Category

📝 Abstract

This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.

Problem

Research questions and friction points this paper is trying to address.

Analyzes Bellman error centering in RL

Proves convergence of CTD and CTDC algorithms

Validates stability of proposed algorithms experimentally

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bellman Error Centering technique

On-policy CTD algorithm

Off-policy CTDC algorithm

🔎 Similar Papers

Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization