Corner Gradient Descent

📅 2025-04-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

In infinite-dimensional quadratic optimization, Jacobi-type heavy-ball (HB) methods under mini-batch SGD diverge due to sampling noise and fail to achieve the optimal deterministic convergence rate $O(t^{-2zeta})$. Method: This paper proposes a generalized SGD algorithm with a complex-plane angular contour, introducing an exterior-angle parameter $ hetapi$ to explicitly trade off acceleration against noise amplification. Contribution/Results: We establish, for the first time, a quantitative link between contour geometry and convergence order $O(t^{- hetazeta})$, and derive the stochastic optimal acceleration bound $ heta_{max} = min{2, u, 2/(zeta + 1/ u)}$, overcoming the fundamental failure of classical HB in stochastic settings. Leveraging complex analysis modeling, power-function rational approximation, and an infinite-memory generalization framework—under joint spectral assumptions (capacity and source conditions)—our method attains near-optimal stochastic rates approaching $O(t^{-2zeta})$. Experiments on MNIST and synthetic tasks demonstrate significant improvements over standard SGD and Heavy Ball.

Technology Category

Application Category

📝 Abstract

We consider SGD-type optimization on infinite-dimensional quadratic problems with power law spectral conditions. It is well-known that on such problems deterministic GD has loss convergence rates $L_t=O(t^{-zeta})$, which can be improved to $L_t=O(t^{-2zeta})$ by using Heavy Ball with a non-stationary Jacobi-based schedule (and the latter rate is optimal among fixed schedules). However, in the mini-batch Stochastic GD setting, the sampling noise causes the Jacobi HB to diverge; accordingly no $O(t^{-2zeta})$ algorithm is known. In this paper we show that rates up to $O(t^{-2zeta})$ can be achieved by a generalized stationary SGD with infinite memory. We start by identifying generalized (S)GD algorithms with contours in the complex plane. We then show that contours that have a corner with external angle $ hetapi$ accelerate the plain GD rate $O(t^{-zeta})$ to $O(t^{- hetazeta})$. For deterministic GD, increasing $ heta$ allows to achieve rates arbitrarily close to $O(t^{-2zeta})$. However, in Stochastic GD, increasing $ heta$ also amplifies the sampling noise, so in general $ heta$ needs to be optimized by balancing the acceleration and noise effects. We prove that the optimal rate is given by $ heta_{max}=min(2, u, frac{2}{zeta+1/ u})$, where $ u,zeta$ are the exponents appearing in the capacity and source spectral conditions. Furthermore, using fast rational approximations of the power functions, we show that ideal corner algorithms can be efficiently approximated by finite-memory algorithms, and demonstrate their practical efficiency on a synthetic problem and MNIST.

Problem

Research questions and friction points this paper is trying to address.

Optimizing SGD for infinite-dimensional quadratic problems

Achieving optimal convergence rates with stationary SGD

Balancing acceleration and noise in stochastic gradient descent

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generalized stationary SGD with infinite memory

Contours with corner angles accelerate convergence

Fast rational approximations for finite-memory algorithms

🔎 Similar Papers

No similar papers found.

Authors to Follow