๐ค AI Summary
This work addresses the problem of determining the optimal neural network parameter count under compute constraints and infinite data. We propose an analytically tractable neural scaling model that characterizes the tripartite relationship among data complexity, target complexity, and model size. We rigorously establish the existence of four primary and three secondary computational scaling regimes, where phase boundaries are governed by the relative dominance of model capacity, optimizer-induced noise, and feature embedding geometry. Leveraging mean-square loss and single-step SGD, we derive closed-form expressions for the full training loss trajectory. Combining theoretical analysis with large-scale numerical experiments, we precisely quantify the scaling exponents for each regime and provide explicit closed-form formulas for the optimal parameter count as a function of floating-point operation budgetโthereby substantially improving predictive accuracy for large-model training efficiency.
๐ Abstract
We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count. We use this neural scaling model to derive new predictions about the compute-limited, infinite-data scaling law regime. To train the neural scaling model, we run one-pass stochastic gradient descent on a mean-squared loss. We derive a representation of the loss curves which holds over all iteration counts and improves in accuracy as the model parameter count grows. We then analyze the compute-optimal model-parameter-count, and identify 4 phases (+3 subphases) in the data-complexity/target-complexity phase-plane. The phase boundaries are determined by the relative importance of model capacity, optimizer noise, and embedding of the features. We furthermore derive, with mathematical proof and extensive numerical evidence, the scaling-law exponents in all of these phases, in particular computing the optimal model-parameter-count as a function of floating point operation budget.