From Sublinear to Linear: Fast Convergence in Deep Networks via Locally Polyak-Lojasiewicz Regions

📅 2025-07-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
While gradient descent (GD) empirically exhibits linear (exponential) convergence on non-convex loss landscapes of deep neural networks, existing theory only guarantees sublinear rates. Method: The authors introduce the notion of a *Local Polyak–Łojasiewicz Region* (LPLR)—a neighborhood around the initialization where the loss satisfies the PL condition—and establish its existence under mild Neural Tangent Kernel (NTK) stability assumptions. Their analysis integrates the NTK framework, local geometric characterization, and gradient dynamics evolution. Contribution/Results: This work provides the first rigorous proof of linear convergence of GD for finite-width deep networks—specifically fully connected networks and ResNets—within the LPLR. Crucially, the LPLR is shown to be practically prevalent, and the theoretical convergence rate aligns closely with empirical observations. By bridging the gap between non-convex optimization theory and deep learning practice, the study resolves a long-standing discrepancy in convergence-rate guarantees.

Technology Category

Application Category

📝 Abstract
The convergence of gradient descent (GD) on the non-convex loss landscapes of deep neural networks (DNNs) presents a fundamental theoretical challenge. While recent work has established that GD converges to a stationary point at a sublinear rate within locally quasi-convex regions (LQCRs), this fails to explain the exponential convergence rates consistently observed in practice. In this paper, we resolve this discrepancy by proving that under a mild assumption on Neural Tangent Kernel (NTK) stability, these same regions satisfy a local Polyak-Lojasiewicz (PL) condition. We introduce the concept of a Locally Polyak-Lojasiewicz Region (LPLR), where the squared gradient norm lower-bounds the suboptimality gap, prove that properly initialized finite-width networks admit such regions around initialization, and establish that GD achieves linear convergence within an LPLR, providing the first finite-width guarantee that matches empirically observed rates. We validate our theory across diverse settings, from controlled experiments on fully-connected networks to modern ResNet architectures trained with stochastic methods, demonstrating that LPLR structure emerges robustly in practical deep learning scenarios. By rigorously connecting local landscape geometry to fast optimization through the NTK framework, our work provides a definitive theoretical explanation for the remarkable efficiency of gradient-based optimization in deep learning.
Problem

Research questions and friction points this paper is trying to address.

Explains fast convergence in deep networks via local PL regions
Links NTK stability to linear convergence in non-convex landscapes
Provides finite-width guarantees matching empirical optimization rates
Innovation

Methods, ideas, or system contributions that make the work stand out.

Local Polyak-Lojasiewicz Regions (LPLR) for fast convergence
Neural Tangent Kernel (NTK) stability assumption
Linear convergence guarantee in finite-width networks
🔎 Similar Papers
No similar papers found.
A
Agnideep Aich
Department of Mathematics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA
Ashit Baran Aich
Ashit Baran Aich
Former Professor of Statistics, Presidency College
StatisticsStatistical Machine LearningProbabilityStatistical LearningDeep Learning
B
Bruce Wade
Department of Mathematics, University of Louisiana at Lafayette, Lafayette, Louisiana, USA