The Nuclear Route: Sharp Asymptotics of ERM in Overparameterized Quadratic Networks

📅 2025-05-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies empirical risk minimization (ERM) for overparameterized two-layer neural networks with quadratic activation in the high-dimensional asymptotic regime. Methodologically, it establishes a rigorous equivalence between the ℓ₂-regularized ERM and low-rank matrix sensing, mapping the nonconvex problem to a convex nuclear-norm-regularized optimization. The key contributions are threefold: first, it reveals that capacity control arises intrinsically from the low-rank structure of the learned feature map; second, it characterizes the global optimum explicitly as a low-rank solution; third, it identifies a sharp, width-dependent generalization phase transition threshold governing both training and test error. Leveraging tools from spin-glass theory, asymptotic spectral analysis, and high-dimensional statistical inference, the authors derive closed-form asymptotic expressions for the prediction error—validated by strong agreement with numerical experiments. Crucially, the work establishes low-rankness—not merely overparameterization—as the fundamental mechanism underlying generalization and provides an exact learnability characterization for such networks.

Technology Category

Application Category

📝 Abstract
We study the high-dimensional asymptotics of empirical risk minimization (ERM) in over-parametrized two-layer neural networks with quadratic activations trained on synthetic data. We derive sharp asymptotics for both training and test errors by mapping the $ell_2$-regularized learning problem to a convex matrix sensing task with nuclear norm penalization. This reveals that capacity control in such networks emerges from a low-rank structure in the learned feature maps. Our results characterize the global minima of the loss and yield precise generalization thresholds, showing how the width of the target function governs learnability. This analysis bridges and extends ideas from spin-glass methods, matrix factorization, and convex optimization and emphasizes the deep link between low-rank matrix sensing and learning in quadratic neural networks.
Problem

Research questions and friction points this paper is trying to address.

Study high-dimensional asymptotics of ERM in overparameterized quadratic networks
Derive sharp asymptotics for training and test errors via convex matrix sensing
Characterize global minima and generalization thresholds in quadratic neural networks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Mapping ERM to convex matrix sensing
Nuclear norm penalization for capacity control
Low-rank structure in feature maps
🔎 Similar Papers
No similar papers found.