Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

📅 2024-10-07
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the non-asymptotic error behavior of constant-step-size stochastic gradient descent (SGD) for strongly convex and smooth optimization, with the goal of precisely characterizing its bias–variance trade-off. Methodologically, we establish, for the first time, geometric ergodicity of the SGD iteration chain under a weighted Wasserstein semi-metric, and integrate Polyak–Ruppert averaging with Richardson–Romberg extrapolation. This yields a fine-grained non-asymptotic expansion of the mean-squared error (MSE) of the resulting estimator: the leading term is $O(n^{-1/2})$, the second-order term achieves the currently best-known rate $O(n^{-3/4})$, and the analysis extends to higher-order moment bounds. The expansion features an explicit covariance structure, significantly refining the characterization of root-MSE convergence rates. To our knowledge, this provides the first theoretical benchmark for constant-step-size SGD with explicit constants and high-order accuracy.

Technology Category

Application Category

📝 Abstract
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $mathcal{O}(n^{-3/4})$, where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.
Problem

Research questions and friction points this paper is trying to address.

Reducing asymptotic bias in SGD with Richardson-Romberg extrapolation
Analyzing mean-squared error expansion for constant step-size SGD
Establishing geometric ergodicity of SGD as Markov chain
Innovation

Methods, ideas, or system contributions that make the work stand out.

Richardson-Romberg extrapolation reduces SGD bias
MSE expansion with explicit covariance matrix dependence
Geometric ergodicity in weighted Wasserstein semimetric
🔎 Similar Papers
No similar papers found.
Marina Sheshukova
Marina Sheshukova
HSE university
Markov Chainshigh-dimensional probability
D
D. Belomestny
Duisburg-Essen University, HSE University
Alain Durmus
Alain Durmus
Ecole polytechnique
Machine learningStatistics
É
Éric Moulines
CMAP, Ecole Polytechnique, Institut Polytechnique de Paris, 91128 Palaiseau, France, Mohamed Bin Zayed University of AI
Alexey Naumov
Alexey Naumov
Professor, HSE University
probability theorystatisticsmachine learningrandom matricesreinforcement learning
S
S. Samsonov
HSE University