🤖 AI Summary
Traditional acceleration methods (e.g., NAG, classical momentum) often diverge in ill-conditioned or nonconvex optimization due to uncontrolled momentum accumulation.
Method: We propose HB-SGE—a novel integration of predictive gradient extrapolation into the heavy-ball (HB) momentum framework. Leveraging local Taylor approximation, HB-SGE anticipates gradient directions to enable adaptive acceleration while ensuring stability. It retains O(d) memory complexity—matching standard first-order methods.
Contribution/Results: HB-SGE is the first first-order algorithm to provably guarantee global convergence across strongly convex, ill-conditioned, and nonconvex settings. Experiments demonstrate its robustness: it converges in 119 iterations on an ill-conditioned quadratic (κ = 50), where SGD and NAG diverge; and in 2,718 iterations on the Rosenbrock function, whereas classical momentum diverges within 10 steps. These results substantially broaden the applicability frontier of first-order optimization.
📝 Abstract
Accelerated gradient methods like Nesterov's Accelerated Gradient (NAG) achieve faster convergence on well-conditioned problems but often diverge on ill-conditioned or non-convex landscapes due to aggressive momentum accumulation. We propose Heavy-Ball Synthetic Gradient Extrapolation (HB-SGE), a robust first-order method that combines heavy-ball momentum with predictive gradient extrapolation. Unlike classical momentum methods that accumulate historical gradients, HB-SGE estimates future gradient directions using local Taylor approximations, providing adaptive acceleration while maintaining stability. We prove convergence guarantees for strongly convex functions and demonstrate empirically that HB-SGE prevents divergence on problems where NAG and standard momentum fail. On ill-conditioned quadratics (condition number $κ=50$), HB-SGE converges in 119 iterations while both SGD and NAG diverge. On the non-convex Rosenbrock function, HB-SGE achieves convergence in 2,718 iterations where classical momentum methods diverge within 10 steps. While NAG remains faster on well-conditioned problems, HB-SGE provides a robust alternative with speedup over SGD across diverse landscapes, requiring only $O(d)$ memory overhead and the same hyperparameters as standard momentum.