Models of Heavy-Tailed Mechanistic Universality

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks exhibit ubiquitous heavy-tailed (power-law) spectral statistics—e.g., in Jacobian and Hessian eigenvalue distributions—yet the underlying mechanisms linking such spectra to generalization remain poorly understood. Method: We propose the first tunable High-Temperature Marchenko–Pastur (HTMP) random matrix model, grounded in random matrix theory and high-dimensional spectral analysis. HTMP attributes heavy tails to three decoupled implicit bias sources: data correlation, training temperature, and eigenvector entropy; it further introduces a unified “eigenvalue repulsion” parameter governing power-law exponents in both upper and lower spectral tails. Contribution/Results: The model faithfully reproduces key empirical phenomena—including neural scaling laws, heavy-tailed optimization trajectories, and the six-phase training dynamics (“five plus one phases”)—and empirically validates strong correlations between heavy-tailed spectral metrics and generalization performance. This establishes a unifying theoretical framework for understanding deep learning efficacy through spectral principles.

Technology Category

Application Category

📝 Abstract
Recent theoretical and empirical successes in deep learning, including the celebrated neural scaling laws, are punctuated by the observation that many objects of interest tend to exhibit some form of heavy-tailed or power law behavior. In particular, the prevalence of heavy-tailed spectral densities in Jacobians, Hessians, and weight matrices has led to the introduction of the concept of heavy-tailed mechanistic universality (HT-MU). Multiple lines of empirical evidence suggest a robust correlation between heavy-tailed metrics and model performance, indicating that HT-MU may be a fundamental aspect of deep learning efficacy. Here, we propose a general family of random matrix models -- the high-temperature Marchenko-Pastur (HTMP) ensemble -- to explore attributes that give rise to heavy-tailed behavior in trained neural networks. Under this model, spectral densities with power laws on (upper and lower) tails arise through a combination of three independent factors (complex correlation structures in the data; reduced temperatures during training; and reduced eigenvector entropy), appearing as an implicit bias in the model structure, and they can be controlled with an"eigenvalue repulsion"parameter. Implications of our model on other appearances of heavy tails, including neural scaling laws, optimizer trajectories, and the five-plus-one phases of neural network training, are discussed.
Problem

Research questions and friction points this paper is trying to address.

Explores heavy-tailed behavior in neural networks
Proposes HTMP ensemble to model power law spectra
Links heavy-tailed metrics to deep learning performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

HTMP ensemble models heavy-tailed behavior
Eigenvalue repulsion controls spectral densities
Complex data correlation reduces eigenvector entropy
🔎 Similar Papers
No similar papers found.
Liam Hodgkinson
Liam Hodgkinson
University of Melbourne
probabilistic machine learningdeep learning theory
Z
Zhichao Wang
Department of Statistics, University of California, Berkeley CA, USA
M
Michael W. Mahoney
Department of Statistics, University of California, Berkeley CA, USA; International Computer Science Institute, Berkeley CA, USA; Lawrence Berkeley National Laboratory, Berkeley CA, USA