On the Generalization Behavior of Deep Residual Networks From a Dynamical System Perspective

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of a unified theoretical framework for the generalization behavior of deep residual networks (ResNets) under both discrete and continuous-time formulations, particularly regarding discrepancies in sample complexity and underlying assumptions. By adopting a dynamical systems perspective and integrating Rademacher complexity, flow maps, and the convergence of ResNets in the infinite-depth limit, the paper establishes the first depth-independent generalization bound that incorporates a structure-dependent negative term. This approach unifies the characterization of generalization for both discrete and continuous ResNets under weaker assumptions and yields a generalization error bound of order $O(1/\sqrt{S})$ with respect to the number of training samples $S$, thereby effectively bridging the theoretical gap between the two settings.

Technology Category

Application Category

📝 Abstract
Deep neural networks (DNNs) have significantly advanced machine learning, with model depth playing a central role in their successes. The dynamical system modeling approach has recently emerged as a powerful framework, offering new mathematical insights into the structure and learning behavior of DNNs. In this work, we establish generalization error bounds for both discrete- and continuous-time residual networks (ResNets) by combining Rademacher complexity, flow maps of dynamical systems, and the convergence behavior of ResNets in the deep-layer limit. The resulting bounds are of order $O(1/\sqrt{S})$ with respect to the number of training samples $S$, and include a structure-dependent negative term, yielding depth-uniform and asymptotic generalization bounds under milder assumptions. These findings provide a unified understanding of generalization across both discrete- and continuous-time ResNets, helping to close the gap in both the order of sample complexity and assumptions between the discrete- and continuous-time settings.
Problem

Research questions and friction points this paper is trying to address.

generalization
residual networks
dynamical systems
Rademacher complexity
deep learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

generalization bounds
residual networks
dynamical systems
Rademacher complexity
deep-layer limit