Designing Preconditioners for SGD: Local Conditioning, Noise Floors, and Basin Stability

📅 2025-11-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Stochastic Gradient Descent (SGD) often stagnates in late-stage training due to anisotropic curvature and gradient noise. This paper proposes a geometric optimization framework based on a preconditioner matrix (M), unifying the characterization of local condition number, lower bound on gradient noise, and basin stability of attraction. Under the (M)-induced Riemannian metric, the product of the effective condition number and the preconditioned noise level governs both convergence rate and steady-state error lower bound. Theoretically, this work provides the first basin stability guarantee—expressed in the (M)-norm—for non-convex landscapes. The method leverages symmetric positive-definite matrices to induce a Riemannian geometry, integrating stochastic optimization with local smoothness modeling, and supports both diagonal adaptive and curvature-aware preconditioner designs. Experiments on quadratic diagnostic tasks and three scientific machine learning benchmarks validate the predicted trade-off between convergence rate and noise amplification, demonstrating substantial improvements in convergence speed and training stability.

Technology Category

Application Category

📝 Abstract
Stochastic Gradient Descent (SGD) often slows in the late stage of training due to anisotropic curvature and gradient noise. We analyze preconditioned SGD in the geometry induced by a symmetric positive definite matrix $mathbf{M}$, deriving bounds in which both the convergence rate and the stochastic noise floor are governed by $mathbf{M}$-dependent quantities: the rate through an effective condition number in the $mathbf{M}$-metric, and the floor through the product of that condition number and the preconditioned noise level. For nonconvex objectives, we establish a preconditioner-dependent basin-stability guarantee: when smoothness and basin size are measured in the $mathbf{M}$-norm, the probability that the iterates remain in a well-behaved local region admits an explicit lower bound. This perspective is particularly relevant in Scientific Machine Learning (SciML), where achieving small training loss under stochastic updates is closely tied to physical fidelity, numerical stability, and constraint satisfaction. The framework applies to both diagonal/adaptive and curvature-aware preconditioners and yields a simple design principle: choose $mathbf{M}$ to improve local conditioning while attenuating noise. Experiments on a quadratic diagnostic and three SciML benchmarks validate the predicted rate-floor behavior.
Problem

Research questions and friction points this paper is trying to address.

Analyzing preconditioned SGD convergence and noise behavior
Establishing basin stability guarantees for nonconvex optimization
Developing design principles for effective preconditioner selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Preconditioned SGD in M-metric geometry
Effective condition number governs convergence rate
M-dependent basin stability for nonconvex objectives
🔎 Similar Papers
No similar papers found.
M
Mitchell Scott
Department of Mathematics, Emory University, Atlanta, GA
T
Tianshi Xu
Department of Mathematics, Emory University, Atlanta, GA
Ziyuan Tang
Ziyuan Tang
University of Minnesota
A
Alexandra Pichette-Emmons
Department of Mathematics, University of Kentucky, Lexington, KY
Q
Qiang Ye
Department of Mathematics, University of Kentucky, Lexington, KY
Yousef Saad
Yousef Saad
Department of Computer Science, University of Minnesota, Minneapolis, MN
Yuanzhe Xi
Yuanzhe Xi
Associate Professor, Emory University
Numerical linear algebraScientific Machine Learning