🤖 AI Summary
This work investigates the non-asymptotic error behavior of constant-step-size stochastic gradient descent (SGD) for strongly convex and smooth optimization, with the goal of precisely characterizing its bias–variance trade-off. Methodologically, we establish, for the first time, geometric ergodicity of the SGD iteration chain under a weighted Wasserstein semi-metric, and integrate Polyak–Ruppert averaging with Richardson–Romberg extrapolation. This yields a fine-grained non-asymptotic expansion of the mean-squared error (MSE) of the resulting estimator: the leading term is $O(n^{-1/2})$, the second-order term achieves the currently best-known rate $O(n^{-3/4})$, and the analysis extends to higher-order moment bounds. The expansion features an explicit covariance structure, significantly refining the characterization of root-MSE convergence rates. To our knowledge, this provides the first theoretical benchmark for constant-step-size SGD with explicit constants and high-order accuracy.
📝 Abstract
We address the problem of solving strongly convex and smooth minimization problems using stochastic gradient descent (SGD) algorithm with a constant step size. Previous works suggested to combine the Polyak-Ruppert averaging procedure with the Richardson-Romberg extrapolation to reduce the asymptotic bias of SGD at the expense of a mild increase of the variance. We significantly extend previous results by providing an expansion of the mean-squared error of the resulting estimator with respect to the number of iterations $n$. We show that the root mean-squared error can be decomposed into the sum of two terms: a leading one of order $mathcal{O}(n^{-1/2})$ with explicit dependence on a minimax-optimal asymptotic covariance matrix, and a second-order term of order $mathcal{O}(n^{-3/4})$, where the power $3/4$ is best known. We also extend this result to the higher-order moment bounds. Our analysis relies on the properties of the SGD iterates viewed as a time-homogeneous Markov chain. In particular, we establish that this chain is geometrically ergodic with respect to a suitably defined weighted Wasserstein semimetric.