Scaling and renormalization in high-dimensional regression

📅 2024-05-01

🏛️ arXiv.org

📈 Citations: 20

✨ Influential: 1

🤖 AI Summary

This work investigates the training and generalization behavior of high-dimensional ridge regression and random feature models, focusing on scaling laws and renormalization phenomena in the overparameterized regime. Methodologically, it leverages the S-transform from free probability theory—establishing, for the first time, a direct analytical link between the training–generalization gap and spectral properties of the data covariance, thereby deriving a closed-form expression for the generalization error. It identifies feature variance as the dominant factor governing a novel scaling regime and reveals that anisotropic weight distributions induce nontrivial finite-width correction exponents. To absorb statistical fluctuations, the paper introduces a renormalized ridge parameter and constructs a generalized cross-validation–type estimator. The results unify explanations of neural scaling laws, theoretically predict—and empirically validate—multiple power-law behaviors, including benign overfitting and power-law decay of test error. Collectively, this work provides a rigorous analytic framework for high-dimensional statistical learning.

Technology Category

Application Category

📝 Abstract

This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability. We provide an introduction and review of recent results on these topics, aimed at readers with backgrounds in physics and deep learning. Analytic formulas for the training and generalization errors are obtained in a few lines of algebra directly from the properties of the $S$-transform of free probability. This allows for a straightforward identification of the sources of power-law scaling in model performance. We compute the generalization error of a broad class of random feature models. We find that in all models, the $S$-transform corresponds to the train-test generalization gap, and yields an analogue of the generalized-cross-validation estimator. Using these techniques, we derive fine-grained bias-variance decompositions for a very general class of random feature models with structured covariates. These novel results allow us to discover a scaling regime for random feature models where the variance due to the features limits performance in the overparameterized setting. We also demonstrate how anisotropic weight structure in random feature models can limit performance and lead to nontrivial exponents for finite-width corrections in the overparameterized setting. Our results extend and provide a unifying perspective on earlier models of neural scaling laws.

Problem

Research questions and friction points this paper is trying to address.

Understanding ridge regression behaviors in high-dimensional learning

Analyzing power-law scaling in model performance using free probability

Exploring bias-variance decompositions in random feature models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Renormalization of ridge parameter via covariance fluctuations

Analytic error formulas using free probability S-transform

Bias-variance decomposition for structured random feature models

🔎 Similar Papers

“Normalized Stress” is Not Normalized: How to Interpret Stress Correctly

2024-08-14Workshop on Beyond Time and Errors: Novel Evaluation Methods for VisualizationCitations: 5

Prevalidated ridge regression is a highly-efficient drop-in replacement for logistic regression for high-dimensional data

2024-01-28arXiv.orgCitations: 0

Improving Numerical Stability of Normalized Mutual Information Estimator on High Dimensions

2024-10-10arXiv.orgCitations: 0

Authors to Follow