🤖 AI Summary
This work investigates gradient descent dynamics of ReLU networks under Euclidean loss, focusing on the modeling challenge of gradient flows for input-layer weights and biases. We derive, for the first time, explicit gradient flow equations in activation coordinates and introduce a novel paradigm—“dynamical data truncation”—where training data clusters are exponentially truncated during optimization, with truncation rate accelerating via positive feedback proportional to the number of already-truncated samples. Leveraging coordinate-adaptive modeling and rigorous analysis of ReLU network dynamics, we obtain multiple analytical solutions that quantitatively characterize the asymptotic simplification mechanism underlying truncation. Our results provide a rigorous theoretical foundation for interpreting deep learning optimization trajectories and reveal intrinsic connections between implicit regularization and the evolution of model complexity.
📝 Abstract
We derive explicit equations governing the cumulative biases and weights in Deep Learning with ReLU activation function, based on gradient descent for the Euclidean cost in the input layer, and under the assumption that the weights are, in a precise sense, adapted to the coordinate system distinguished by the activations. We show that gradient descent corresponds to a dynamical process in the input layer, whereby clusters of data are progressively reduced in complexity ("truncated") at an exponential rate that increases with the number of data points that have already been truncated. We provide a detailed discussion of several types of solutions to the gradient flow equations. A main motivation for this work is to shed light on the interpretability question in supervised learning.