Extensions of Robbins-Siegmund Theorem with Applications in Reinforcement Learning

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The classical Robbins–Siegmund theorem requires summability of zero-order terms—a condition too restrictive for canonical reinforcement learning algorithms such as Q-learning. Method: We propose a fundamental extension: under only square-summability (not summability) of zero-order terms and mild control assumptions on the increments, we establish almost-sure convergence of quasi-supermartingales to bounded sets. Contribution/Results: Our analysis yields three novel convergence guarantees: (1) almost-sure convergence rates; (2) high-probability concentration bounds; and (3) Lᵖ-convergence rates for all p ≥ 1. Crucially, this is the first framework to simultaneously deliver all three guarantees for Q-learning with linear function approximation—breaking key limitations of classical stochastic approximation theory and significantly enhancing both the applicability and precision of convergence analysis in reinforcement learning.

Technology Category

Application Category

📝 Abstract
The Robbins-Siegmund theorem establishes the convergence of stochastic processes that are almost supermartingales and is foundational for analyzing a wide range of stochastic iterative algorithms in stochastic approximation and reinforcement learning (RL). However, its original form has a significant limitation as it requires the zero-order term to be summable. In many important RL applications, this summable condition, however, cannot be met. This limitation motivates us to extend the Robbins-Siegmund theorem for almost supermartingales where the zero-order term is not summable but only square summable. Particularly, we introduce a novel and mild assumption on the increments of the stochastic processes. This together with the square summable condition enables an almost sure convergence to a bounded set. Additionally, we further provide almost sure convergence rates, high probability concentration bounds, and $L^p$ convergence rates. We then apply the new results in stochastic approximation and RL. Notably, we obtain the first almost sure convergence rate, the first high probability concentration bound, and the first $L^p$ convergence rate for $Q$-learning with linear function approximation.
Problem

Research questions and friction points this paper is trying to address.

Extends Robbins-Siegmund theorem for non-summable zero-order terms
Establishes convergence rates for Q-learning with function approximation
Provides convergence guarantees under square-summable conditions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extended Robbins-Siegmund theorem with square-summable condition
Introduced mild assumption on stochastic process increments
Applied new convergence results to Q-learning with linear approximation
🔎 Similar Papers
No similar papers found.
X
Xinyu Liu
Department of Computer Science, University of Virginia
Z
Zixuan Xie
Department of Computer Science, University of Virginia
Shangtong Zhang
Shangtong Zhang
University of Virginia
reinforcement learningstochastic approximation