Multivariate Distributional Reinforcement Learning Using Sliced Divergences

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the challenge in multivariate distributional reinforcement learning of lacking a distance metric that simultaneously ensures computational tractability and Bellman contraction, particularly under complex structures such as matrix-valued discounting where convergence guarantees are difficult to establish. To overcome this, the paper introduces Sliced Distributional Reinforcement Learning (SDRL), which extends computationally feasible one-dimensional divergences—such as Wasserstein, Cramér, and Maximum Mean Discrepancy (MMD)—to multivariate return distributions via slicing projections. For the first time, SDRL establishes the contraction property of the Bellman operator under general matrix discounting. The framework unifies multiple base divergences within a theoretically grounded approach and demonstrates empirical effectiveness and stability across chain environments, image-based grid worlds, and selected Atari games.

📝 Abstract

Distributional reinforcement learning (DRL) models the full return distribution rather than expectations, but extending it to multivariate settings remains challenging. Many common metrics do not naturally generalize beyond one dimension or lose computational tractability, and the multivariate case introduces additional difficulties such as general matrix discounting, for which no contraction results are available. We introduce Sliced Distributional Reinforcement Learning (SDRL), which lifts tractable one-dimensional divergences to multivariate return distributions via projections. We prove Bellman contraction for uniform slicing under shared scalar discounting, and introduce a maximum-slicing variant with contraction under general dense discount matrices. SDRL supports a broad class of base divergences; we analyze Wasserstein, Cramér, and Maximum Mean Discrepancy (MMD), and characterize which SDRL variants suit the standard single-sample Bellman update used in distributional RL. We evaluate SDRL on a toy chain problem and a gridworld image-based environment as well as a subset of Atari games.

Problem

Research questions and friction points this paper is trying to address.

Multivariate Distributional Reinforcement Learning

Sliced Divergences

Bellman Contraction

Matrix Discounting

Return Distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Sliced Distributional Reinforcement Learning

Multivariate Return Distributions

Bellman Contraction