Effects of Initialization Biases on Deep Neural Network Training Dynamics

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Deep neural networks exhibit an “initial guess bias” upon random initialization—producing excessively high confidence for a few classes and near-zero outputs for others—distorting early training dynamics. This bias strongly couples with the loss function, causing noise-robust losses (e.g., Blurry and Piecewise-zero) to lose gradient guidance during early epochs and even amplify erroneous class preferences. Method: We conduct theoretical analysis and empirical validation via probabilistic trajectory modeling, training-path visualization, and response evolution analysis to systematically uncover the causal interplay among initialization bias, loss function design, and optimization direction. Contribution/Results: We establish and empirically verify the novel principle that “loss functions must be adapted to the initialization distribution.” This insight provides both foundational theoretical justification and actionable guidance for designing robust training procedures, particularly under label noise. Our findings reveal previously overlooked initialization–loss interactions critical for stable and reliable early-stage optimization.

Technology Category

Application Category

📝 Abstract
Untrained large neural networks, just after random initialization, tend to favour a small subset of classes, assigning high predicted probabilities to these few classes and approximately zero probability to all others. This bias, termed Initial Guessing Bias, affects the early training dynamics, when the model is fitting to the coarse structure of the data. The choice of loss function against which to train the model has a large impact on how these early dynamics play out. Two recent loss functions, Blurry and Piecewise-zero loss, were designed for robustness to label errors but can become unable to steer the direction of training when exposed to this initial bias. Results indicate that the choice of loss function has a dramatic effect on the early phase training of networks, and highlights the need for careful consideration of how Initial Guessing Bias may interact with various components of the training scheme.
Problem

Research questions and friction points this paper is trying to address.

Initial Guessing Bias affects early neural network training dynamics
Loss function choice significantly impacts training under initialization bias
Blurry and Piecewise-zero losses struggle with initial class prediction bias
Innovation

Methods, ideas, or system contributions that make the work stand out.

Initial Guessing Bias affects early training dynamics
Loss function choice impacts early training phase
Blurry and Piecewise-zero loss struggle with initial bias
🔎 Similar Papers
No similar papers found.
Nicholas Pellegrino
Nicholas Pellegrino
Doctoral Candidate, Systems Design Engineering, University of Waterloo
D
David Szczecina
Vision and Image Processing Group, Systems Design Engineering, University of Waterloo; Mechanical & Mechatronics Engineering, University of Waterloo
P
Paul W. Fieguth
Vision and Image Processing Group, Systems Design Engineering, University of Waterloo