Bellman Error Centering

📅 2025-02-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the instability of value function estimation in reinforcement learning by proposing the Bellman Error Centering (BEC) unified framework, which reveals that Variance-Reduced Critics (VRC) are inherently instances of BEC. It systematically resolves the challenge of constructing centered fixed points for both tabular and linear value function approximation. We introduce, for the first time, Centered TD (CTD) and Centered TD Correction (CTDC) algorithms—both equipped with rigorous theoretical convergence guarantees—and prove their convergence under both on-policy and off-policy settings. The BEC paradigm generalizes to a broad class of temporal-difference algorithms, significantly enhancing training stability and robustness. Empirical evaluations demonstrate that our methods consistently outperform conventional reward centering across diverse tasks and policy distributions, delivering uniform performance gains without task-specific tuning.

Technology Category

Application Category

📝 Abstract
This paper revisits the recently proposed reward centering algorithms including simple reward centering (SRC) and value-based reward centering (VRC), and points out that SRC is indeed the reward centering, while VRC is essentially Bellman error centering (BEC). Based on BEC, we provide the centered fixpoint for tabular value functions, as well as the centered TD fixpoint for linear value function approximation. We design the on-policy CTD algorithm and the off-policy CTDC algorithm, and prove the convergence of both algorithms. Finally, we experimentally validate the stability of our proposed algorithms. Bellman error centering facilitates the extension to various reinforcement learning algorithms.
Problem

Research questions and friction points this paper is trying to address.

Analyzes Bellman error centering in RL
Proves convergence of CTD and CTDC algorithms
Validates stability of proposed algorithms experimentally
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bellman Error Centering technique
On-policy CTD algorithm
Off-policy CTDC algorithm
🔎 Similar Papers
No similar papers found.
X
Xingguo Chen
Nanjing University of Posts and Telecommunications, Nanjing, China
Y
Yu Gong
Nanjing University of Posts and Telecommunications, Nanjing, China
Shangdong Yang
Shangdong Yang
Nanjing University of Posts and Telecommunications
Reinforcement LearningMulti-agent SystemsMulti-armed Bandits
W
Wenhao Wang
National University of Defense Technology, Hefei, China