A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses a critical gap in distributional reinforcement learning by moving beyond the conventional focus on metric contraction properties of the distributional Bellman operator toward a deeper structural understanding at the level of cumulative distribution functions (CDFs). Under the L² geometry induced by the Cramér distance, the paper establishes, for the first time, an intrinsic linear framework characterizing the action of the distributional Bellman update in CDF space, revealing its underlying affine and linear structures. By introducing a spectrally regularized Hilbert representation with vanishing regularization, the authors precisely capture this geometry; in the zero-regularization limit, the representation rigorously recovers the original Cramér metric. This formulation provides a unified functional-analytic and operator-theoretic foundation for distributional reinforcement learning.

Technology Category

Application Category

📝 Abstract
Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an $L^2$ geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.
Problem

Research questions and friction points this paper is trying to address.

Distributional Reinforcement Learning
Bellman Operator
Cramér Metric
Cumulative Distribution Functions
Operator Structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional Reinforcement Learning
Cramér Metric
Bellman Operator
Spectral Representation
Cumulative Distribution Function
🔎 Similar Papers
No similar papers found.
K
Keru Wang
School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland
Y
Yixin Deng
School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland
Yao Lyu
Yao Lyu
Postdoctor, Tsinghua University
autonomous drivingembodied AIreinforcement learning
S
Stephen Redmond
School of Electrical and Electronic Engineering, University College Dublin, Dublin, Ireland
S
Shengbo Eben Li
School of Vehicle and Mobility, Tsinghua University, Beijing, China; College of Artificial Intelligence, Tsinghua University, Beijing, China