A Spectral Revisit of the Distributional Bellman Operator under the Cramér Metric

📅 2026-03-12

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses a critical gap in distributional reinforcement learning by moving beyond the conventional focus on metric contraction properties of the distributional Bellman operator toward a deeper structural understanding at the level of cumulative distribution functions (CDFs). Under the L² geometry induced by the Cramér distance, the paper establishes, for the first time, an intrinsic linear framework characterizing the action of the distributional Bellman update in CDF space, revealing its underlying affine and linear structures. By introducing a spectrally regularized Hilbert representation with vanishing regularization, the authors precisely capture this geometry; in the zero-regularization limit, the representation rigorously recovers the original Cramér metric. This formulation provides a unified functional-analytic and operator-theoretic foundation for distributional reinforcement learning.

Technology Category

Application Category

📝 Abstract

Distributional reinforcement learning (DRL) studies the evolution of full return distributions under Bellman updates rather than focusing on expected values. A classical result is that the distributional Bellman operator is contractive under the Cramér metric, which corresponds to an $L^2$ geometry on differences of cumulative distribution functions (CDFs). While this contraction ensures stability of policy evaluation, existing analyses remain largely metric, focusing on contraction properties without elucidating the structural action of the Bellman update on distributions. In this work, we analyse distributional Bellman dynamics directly at the level of CDFs, treating the Cramér geometry as the intrinsic analytical setting. At this level, the Bellman update acts affinely on CDFs and linearly on differences between CDFs, and its contraction property yields a uniform bound on this linear action. Building on this intrinsic formulation, we construct a family of regularised spectral Hilbert representations that realise the CDF-level geometry by exact conjugation, without modifying the underlying Bellman dynamics. The regularisation affects only the geometry and vanishes in the zero-regularisation limit, recovering the native Cramér metric. This framework clarifies the operator structure underlying distributional Bellman updates and provides a foundation for further functional and operator-theoretic analyses in DRL.

Problem

Research questions and friction points this paper is trying to address.

Distributional Reinforcement Learning

Bellman Operator

Cramér Metric

Cumulative Distribution Functions

Operator Structure

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distributional Reinforcement Learning

Cramér Metric

Bellman Operator

Spectral Representation

Cumulative Distribution Function

🔎 Similar Papers

No similar papers found.

Authors to Follow