How abundant are good interpolators?

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work investigates the distribution of generalization performance for overparameterized linear classifiers that perfectly interpolate the training data. In the high-dimensional proportional limit, the authors establish—via large deviation theory and high-dimensional probability analysis—the first large deviation rate function characterizing the generalization error of interpolating classifiers. Both theoretical analysis and numerical experiments on Gaussian mixture and logistic regression models reveal that the generalization errors of most random interpolating solutions concentrate exponentially around a unique optimal value. Moreover, solutions obtained by optimization algorithms such as gradient descent or linear programming significantly outperform typical random interpolants, demonstrating a pronounced “benign overfitting” advantage.

📝 Abstract

Let $S$ be the set of unit norm linear classifiers $θ\in \mathbb{R}^d$ which correctly classify every point of a labeled dataset $(X_i,y_i)_{i=1}^n$, $X_i \in \mathbb{R}^d$, $y_i \in \{-1,+1\}$, with a possibly negative margin $κ$ fixed in advance. Under two natural data-generating distributions of the $(X,y)$ pairs -- a Gaussian mixture model and a logistic model with Gaussian features -- and in the proportional regime $n/d \to α$ with small enough $α$, we establish a large deviation principle on the event that a point $θ$ chosen uniformly at random from $S$ achieves a given generalization error, with high probability over the choice of the data. The associated large deviation rate function is deterministic and describes the proportion, at the exponential scale in $d$, of interpolating classifiers having a given desired performance. As a consequence, we establish the following concentration phenomenon: all but an exponentially small fraction of interpolating classifiers have approximately the same generalization performance given by the unique maximizer of this rate function. We numerically compare this maximizer to the performance of empirical risk minimization by gradient descent and to the performance of a natural linear program, both finding a point in $S$, and deduce that in the overparametrized regime of small $α$, these efficient procedures outperform the vast majority of interpolators, pointing to their nontrivial benign overfitting in this setting.

Problem

Research questions and friction points this paper is trying to address.

interpolating classifiers

generalization error

overparametrized regime

large deviation principle

benign overfitting

Innovation

Methods, ideas, or system contributions that make the work stand out.

interpolating classifiers

large deviation principle

benign overfitting