Phase diagram and eigenvalue dynamics of stochastic gradient descent in multilayer neural networks

📅 2025-09-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates how SGD hyperparameters—learning rate, batch size, and initial weight variance—affect the training dynamics of multilayer neural networks. We introduce a phase-diagram framework based on the evolution of singular values of weight matrices, wherein the ratio of initial weight variance to the learning rate–batch size quotient is interpreted as an effective disorder strength relative to an effective temperature. Leveraging a Langevin equation derived from Dyson Brownian motion, mean-field theory, and random matrix theory, we characterize the stochastic dynamics of soft spin degrees of freedom in feature space. Our analysis reveals three distinct dynamical phases—convergent, oscillatory, and divergent—each corresponding to qualitatively different training behaviors. Based on this classification, we establish theoretical criteria for hyperparameter selection. This study provides the first unified statistical-physical perspective and quantitative foundation for understanding SGD optimization mechanisms and guiding practical hyperparameter tuning.

Technology Category

Application Category

📝 Abstract
Hyperparameter tuning is one of the essential steps to guarantee the convergence of machine learning models. We argue that intuition about the optimal choice of hyperparameters for stochastic gradient descent can be obtained by studying a neural network's phase diagram, in which each phase is characterised by distinctive dynamics of the singular values of weight matrices. Taking inspiration from disordered systems, we start from the observation that the loss landscape of a multilayer neural network with mean squared error can be interpreted as a disordered system in feature space, where the learnt features are mapped to soft spin degrees of freedom, the initial variance of the weight matrices is interpreted as the strength of the disorder, and temperature is given by the ratio of the learning rate and the batch size. As the model is trained, three phases can be identified, in which the dynamics of weight matrices is qualitatively different. Employing a Langevin equation for stochastic gradient descent, previously derived using Dyson Brownian motion, we demonstrate that the three dynamical regimes can be classified effectively, providing practical guidance for the choice of hyperparameters of the optimiser.
Problem

Research questions and friction points this paper is trying to address.

Characterizing neural network phase diagram dynamics
Mapping hyperparameters to disordered system properties
Classifying dynamical regimes for optimizer guidance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Phase diagram classifies SGD dynamics
Loss landscape as disordered system analogy
Langevin equation guides hyperparameter selection
🔎 Similar Papers
No similar papers found.
C
Chanju Park
Centre for Quantum Fields and Gravity, Department of Physics, Swansea University, Swansea SA2 8PP, United Kingdom
Biagio Lucini
Biagio Lucini
Professor of Mathematics
Theoretical Particle PhysicsLattice Gauge Theories
Gert Aarts
Gert Aarts
Swansea University
Theoretical physicshigh-energy physics