Spectral Reach: Understanding Neural Scaling as Progress into the Spectral Tail

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

Existing theories struggle to explain the mechanisms underlying neural scaling laws, primarily due to a lack of scalable analytical tools. This work proposes a novel metric, “spectral position,” which tracks the evolution of dominant eigenvalues in the empirical Neural Tangent Kernel (eNTK) that drive loss reduction, revealing a shift in learning from leading spectral modes toward the tail of the spectrum during training. The paper further introduces the concept of “spectral reachability,” demonstrating that larger models achieve lower loss by continually learning weak spectral signals inaccessible to smaller ones. Crucially, representation learning emerges as a key mechanism enhancing spectral reachability—by adaptively amplifying gradients, it enables effective optimization even in regions where frozen representations fail. Combining eNTK spectral analysis, scalable spectral metrics, and dynamic feature learning modeling, this study systematically uncovers the interplay among model scale, spectral structure, and generalization performance.

📝 Abstract

Neural scaling laws describe predictable power-law relationships between model size, dataset size, compute, and performance. While these laws guide the development of modern foundation models, the mechanisms underpinning them remain poorly understood, in part due to the absence of scalable analysis tools. To close this gap, we introduce "spectral position": a scalable measure of which eigenvalues of the empirical neural tangent kernel (eNTK) currently drive loss reduction. Applying this measure to scaling experiments, we find that spectral position decreases throughout training: learning shifts from dominant eigenmodes into the spectral tail. Larger models reach further into the tail than smaller models, revealing a size-dependent capacity we call "spectral reach". This suggests why larger models achieve lower losses: they sustain learning on weak spectral signals inaccessible to smaller models. We further identify feature learning as a key enabler of spectral reach. It adaptively amplifies gradient magnitudes as learning advances, sustaining progress where frozen representations stall. This points to concrete interventions through architecture and optimizer design.

Problem

Research questions and friction points this paper is trying to address.

neural scaling laws

spectral tail

empirical neural tangent kernel

feature learning

model capacity

Innovation

Methods, ideas, or system contributions that make the work stand out.

spectral reach

neural scaling laws

empirical neural tangent kernel