🤖 AI Summary
This paper investigates theoretical guarantees for achieving zero training loss and optimization efficiency in overparameterized deep neural networks under supervised learning. For the ℓ² loss and generic training data, it establishes, for the first time, a rigorous proof of zero-loss achievability in arbitrarily deep wide networks, along with an explicit analytical construction of global minimizers—without relying on gradient descent dynamics. Methodologically, the work integrates nonconvex optimization analysis, rank theory of the training Jacobian matrix, and function approximation theory, revealing that increased depth induces Jacobian rank degeneration, thereby degrading the convergence rate of first-order methods. The main contributions are threefold: (1) the first existence theory for zero-loss solutions applicable to general overparameterized deep networks; (2) a computationally tractable, closed-form construction of global minimizers; and (3) a quantitative characterization of the detrimental impact of depth on first-order optimization efficiency, clarifying the fundamental distinction between under- and overparameterized deep learning regimes.
📝 Abstract
We determine sufficient conditions for overparametrized deep learning (DL) networks to guarantee the attainability of zero loss in the context of supervised learning, for the $mathcal{L}^2$ cost and {em generic} training data. We present an explicit construction of the zero loss minimizers without invoking gradient descent. On the other hand, we point out that increase of depth can deteriorate the efficiency of cost minimization using a gradient descent algorithm by analyzing the conditions for rank loss of the training Jacobian. Our results clarify key aspects on the dichotomy between zero loss reachability in underparametrized versus overparametrized DL.