Improved Last-Iterate Convergence of Shuffling Gradient Methods for Nonsmooth Convex Optimization

๐Ÿ“… 2025-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work investigates the last-iterate convergence of Random Reshuffling (RR) and Single-Shuffle (SS) gradient methods for nonsmooth (strongly) convex optimization. Addressing the limitation that existing theory only recovers the convergence rate of proximal gradient descent (Prox-GD), we establish, for the first time, that RR and SS achieve strictly faster last-iterate rates than Prox-GD in the nonsmooth strongly convex setting. Moreover, we propose a suffix-averaging scheme for RR, attaining an (nearly) optimal $O(1/sqrt{T})$ convergence rateโ€”matching the known lower bound. Methodologically, we develop a function-wise proximal gradient framework, integrating stochastic permutation analysis, refined error bound derivation, and suffix averaging. Our results demonstrate that the structured randomness induced by random reshuffling yields intrinsic acceleration, breaking the rate barriers inherent to deterministic methods.

Technology Category

Application Category

๐Ÿ“ Abstract
We study the convergence of the shuffling gradient method, a popular algorithm employed to minimize the finite-sum function with regularization, in which functions are passed to apply (Proximal) Gradient Descent (GD) one by one whose order is determined by a permutation on the indices of functions. In contrast to its easy implementation and effective performance in practice, the theoretical understanding remains limited. A recent advance by (Liu&Zhou, 2024b) establishes the first last-iterate convergence results under various settings, especially proving the optimal rates for smooth (strongly) convex optimization. However, their bounds for nonsmooth (strongly) convex functions are only as fast as Proximal GD. In this work, we provide the first improved last-iterate analysis for the nonsmooth case demonstrating that the widely used Random Reshuffle ($ extsf{RR}$) and Single Shuffle ($ extsf{SS}$) strategies are both provably faster than Proximal GD, reflecting the benefit of randomness. As an important implication, we give the first (nearly) optimal convergence result for the suffix average under the $ extsf{RR}$ sampling scheme in the general convex case, matching the lower bound shown by (Koren et al., 2022).
Problem

Research questions and friction points this paper is trying to address.

Improving last-iterate convergence for nonsmooth convex optimization
Proving Random Reshuffle and Single Shuffle outperform Proximal GD
Establishing optimal convergence for suffix average in general convex case
Innovation

Methods, ideas, or system contributions that make the work stand out.

Improved last-iterate convergence for nonsmooth convex optimization
Random Reshuffle and Single Shuffle faster than Proximal GD
Nearly optimal convergence for suffix average with RR
๐Ÿ”Ž Similar Papers
No similar papers found.