Benefits of Early Stopping in Gradient Descent for Overparameterized Logistic Regression

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In overparameterized logistic regression, gradient descent (GD) converges in direction to the maximum $ell_2$-margin solution but exhibits divergent parameter norms, leading to poor generalization and miscalibration. This work systematically uncovers the implicit statistical regularization effect of early stopping for GD. We first prove that early-stopped GD achieves statistical consistency in high-dimensional overparameterized regimes—where standard GD is inconsistent. We establish non-asymptotic equivalence between early-stopped GD and $ell_2$-regularized solutions, deriving tight upper bounds on both norm and angular deviation. Moreover, we show that early-stopped GD drives excess logistic risk to zero, while attaining vanishing zero-one risk with only polynomial sample complexity—contrasting sharply with the exponential sample requirement of standard GD, thereby revealing a fundamental separation in sample complexity. Our analysis integrates high-dimensional asymptotics, non-asymptotic risk bounds, geometric characterization, and coupled analysis of zero-one and logistic risks.

Technology Category

Application Category

📝 Abstract
In overparameterized logistic regression, gradient descent (GD) iterates diverge in norm while converging in direction to the maximum $ell_2$-margin solution -- a phenomenon known as the implicit bias of GD. This work investigates additional regularization effects induced by early stopping in well-specified high-dimensional logistic regression. We first demonstrate that the excess logistic risk vanishes for early-stopped GD but diverges to infinity for GD iterates at convergence. This suggests that early-stopped GD is well-calibrated, whereas asymptotic GD is statistically inconsistent. Second, we show that to attain a small excess zero-one risk, polynomially many samples are sufficient for early-stopped GD, while exponentially many samples are necessary for any interpolating estimator, including asymptotic GD. This separation underscores the statistical benefits of early stopping in the overparameterized regime. Finally, we establish nonasymptotic bounds on the norm and angular differences between early-stopped GD and $ell_2$-regularized empirical risk minimizer, thereby connecting the implicit regularization of GD with explicit $ell_2$-regularization.
Problem

Research questions and friction points this paper is trying to address.

Early stopping improves logistic risk in overparameterized regression.
Polynomially many samples suffice for early-stopped gradient descent.
Early-stopped GD connects implicit and explicit regularization effects.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Early stopping enhances logistic risk
Polynomial samples suffice for early stopping
Connects GD regularization with explicit methods
🔎 Similar Papers
No similar papers found.