Nonasymptotic Analysis of Stochastic Gradient Descent with the Richardson-Romberg Extrapolation

📅 2024-10-07

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

🤖 AI Summary

This work investigates the non-asymptotic error behavior of constant-step-size stochastic gradient descent (SGD) for strongly convex and smooth optimization, with the goal of precisely characterizing its bias–variance trade-off. Methodologically, we establish, for the first time, geometric ergodicity of the SGD iteration chain under a weighted Wasserstein semi-metric, and integrate Polyak–Ruppert averaging with Richardson–Romberg extrapolation. This yields a fine-grained non-asymptotic expansion of the mean-squared error (MSE) of the resulting estimator: the leading term is $O(n^{-1/2})$, the second-order term achieves the currently best-known rate $O(n^{-3/4})$, and the analysis extends to higher-order moment bounds. The expansion features an explicit covariance structure, significantly refining the characterization of root-MSE convergence rates. To our knowledge, this provides the first theoretical benchmark for constant-step-size SGD with explicit constants and high-order accuracy.

Technology Category

Application Category

📝 Abstract

We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $mathcal{O}(n^{-3/4})$, where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.

Problem

Research questions and friction points this paper is trying to address.

Reducing asymptotic bias in SGD with Richardson-Romberg extrapolation

Analyzing mean-squared error expansion for constant step-size SGD

Establishing geometric ergodicity of SGD as Markov chain

Innovation

Methods, ideas, or system contributions that make the work stand out.

Richardson-Romberg extrapolation reduces SGD bias

MSE expansion with explicit covariance matrix dependence

Geometric ergodicity in weighted Wasserstein semimetric

🔎 Similar Papers

No similar papers found.

Authors to Follow