🤖 AI Summary
This work systematically investigates information propagation, training dynamics, and macroscopic limiting behavior of neural networks from a dynamical systems perspective. To address these challenges, we propose an augmented Neural ODE framework that unifies input-output mappings across diverse architectures; integrate Lyapunov exponent analysis with mean-field limit theory to characterize gradient descent stability and implicit bias emergence in the overparameterized regime; and extend graph measure methods to establish convergence theory for heterogeneous neural networks under graph limits—first revealing their formal connection to Kuramoto-type synchronization models. Key contributions include: (i) rigorous classification of representable function classes for MLPs and Neural ODEs; (ii) a dynamical-systems explanation of SGD stability; and (iii) a novel mean-field analytical paradigm scalable to large-scale graph neural networks. These results provide foundational dynamical insights for interpretable and robust AI, particularly in generative modeling and gradient-based optimization.
📝 Abstract
In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.