A General-Purpose Theorem for High-Probability Bounds of Stochastic Approximation with Polyak Averaging

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Polyak–Ruppert averaging lacks non-asymptotic high-probability error bounds for general stochastic approximation (SA) algorithms. Method: We develop the first universal, tight non-asymptotic concentration inequality framework for SA, integrating stochastic approximation theory, concentration inequalities, and iterative error propagation analysis—thereby overcoming classical limitations of asymptotic analysis and restrictive stationarity/linearity assumptions. Contributions: (1) A unified error control paradigm applicable to contractive SA, temporal difference (TD) learning, and Q-learning—even under non-stationary and nonlinear dynamics; (2) Tight high-probability upper bounds on estimation error, with explicit constant factors; (3) The first non-asymptotic concentration bounds for Polyak-averaged TD and Q-learning, substantially enhancing finite-sample theoretical interpretability and practical algorithmic utility.

Technology Category

Application Category

📝 Abstract
Polyak-Ruppert averaging is a widely used technique to achieve the optimal asymptotic variance of stochastic approximation (SA) algorithms, yet its high-probability performance guarantees remain underexplored in general settings. In this paper, we present a general framework for establishing non-asymptotic concentration bounds for the error of averaged SA iterates. Our approach assumes access to individual concentration bounds for the unaveraged iterates and yields a sharp bound on the averaged iterates. We also construct an example, showing the tightness of our result up to constant multiplicative factors. As direct applications, we derive tight concentration bounds for contractive SA algorithms and for algorithms such as temporal difference learning and Q-learning with averaging, obtaining new bounds in settings where traditional analysis is challenging.
Problem

Research questions and friction points this paper is trying to address.

High-probability bounds for Polyak-Ruppert averaged stochastic approximation
Non-asymptotic concentration bounds for averaged SA iterates
Tight bounds for contractive SA, TD learning, and Q-learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

General framework for non-asymptotic concentration bounds
Sharp bounds on averaged stochastic approximation iterates
Tight concentration bounds for contractive algorithms
🔎 Similar Papers
No similar papers found.
S
S. Khodadadian
Virginia Polytechnic Institute and State University
Martin Zubeldia
Martin Zubeldia
University of Minnesota
Applied probabilityQueueing theoryStochastic Approximation