The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

📅 2025-06-16

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work investigates chaotic sensitivity to initial conditions during the early phase of neural network training. The study reveals that even infinitesimal perturbations cause trajectories initialized identically to diverge rapidly within a few epochs, followed by exponential decay after peak divergence—demonstrating a pronounced “butterfly effect” in early training. Methodologically, the authors introduce the first systematic quantification of this sensitivity via multidimensional divergence metrics: L² parameter distance, loss interpolation barrier, weight permutation alignment, and centered kernel alignment (CKA) for representation similarity. They further analyze how hyperparameter choices and fine-tuning paths steer optimization toward distinct minima. The results provide both theoretical grounding and empirical evidence for model fusion, robustness of fine-tuning, and diversity in ensemble learning.

Technology Category

Application Category

📝 Abstract

Neural network training is inherently sensitive to initialization and the randomness induced by stochastic gradient descent. However, it is unclear to what extent such effects lead to meaningfully different networks, either in terms of the models' weights or the underlying functions that were learned. In this work, we show that during the initial"chaotic"phase of training, even extremely small perturbations reliably causes otherwise identical training trajectories to diverge-an effect that diminishes rapidly over training time. We quantify this divergence through (i) $L^2$ distance between parameters, (ii) the loss barrier when interpolating between networks, (iii) $L^2$ and barrier between parameters after permutation alignment, and (iv) representational similarity between intermediate activations; revealing how perturbations across different hyperparameter or fine-tuning settings drive training trajectories toward distinct loss minima. Our findings provide insights into neural network training stability, with practical implications for fine-tuning, model merging, and diversity of model ensembles.

Problem

Research questions and friction points this paper is trying to address.

Study sensitivity of neural networks to initial conditions

Quantify divergence in training trajectories due to perturbations

Analyze impact on loss minima and model diversity

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantify divergence via L2 distance and loss barrier

Analyze representational similarity of activations

Study perturbation effects on training trajectories

🔎 Similar Papers

No similar papers found.

Authors to Follow