Approximate Gaussianity Beyond Initialisation in Neural Networks

📅 2025-10-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the distributional evolution of weight matrices during neural network training, specifically examining the validity of Gaussianity and permutation symmetry assumptions. Method: We propose the first 13-parameter permutation-invariant Gaussian matrix model—departing from conventional i.i.d. assumptions—by integrating representation theory (to encode symmetry constraints) and graph theory (to construct observable statistics), enabling interpretable modeling of inter-weight correlations. Contribution/Results: Using Wasserstein distance to quantify distributional dynamics, we systematically validate the robustness of Gaussian approximation across broad training regimes and identify how initialization, regularization, depth, and width govern deviations from Gaussianity. Our framework establishes a novel geometric paradigm for understanding deep network parameter spaces, unifying symmetry principles with empirical distributional analysis.

Technology Category

Application Category

📝 Abstract
Ensembles of neural network weight matrices are studied through the training process for the MNIST classification problem, testing the efficacy of matrix models for representing their distributions, under assumptions of Gaussianity and permutation-symmetry. The general 13-parameter permutation invariant Gaussian matrix models are found to be effective models for the correlated Gaussianity in the weight matrices, beyond the range of applicability of the simple Gaussian with independent identically distributed matrix variables, and notably well beyond the initialisation step. The representation theoretic model parameters, and the graph-theoretic characterisation of the permutation invariant matrix observables give an interpretable framework for the best-fit model and for small departures from Gaussianity. Additionally, the Wasserstein distance is calculated for this class of models and used to quantify the movement of the distributions over training. Throughout the work, the effects of varied initialisation regimes, regularisation, layer depth, and layer width are tested for this formalism, identifying limits where particular departures from Gaussianity are enhanced and how more general, yet still highly-interpretable, models can be developed.
Problem

Research questions and friction points this paper is trying to address.

Modeling neural network weight distributions beyond initialization
Testing permutation-invariant Gaussian models for MNIST classification
Quantifying departures from Gaussianity across training regimes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using permutation-invariant Gaussian matrix models for weights
Applying Wasserstein distance to quantify distribution changes
Testing model across varied architectures and initializations
🔎 Similar Papers
No similar papers found.