๐ค AI Summary
This work investigates the optimization geometry of deep linear networks under regularized squared loss. Addressing analytical challenges arising from nonconvexity and hierarchical structure, we systematically characterize the geometry of the critical point set and establish, for the first time, necessary and sufficient conditions for the error bound to hold. Theoretically, under mild assumptions, we prove that all critical points satisfy a global error boundโthereby revealing the fundamental mechanism underlying the linear convergence of gradient descent. We rigorously verify this property using tools from nonconvex optimization, critical point theory, and error bound analysis. Extensive experiments consistently corroborate the theoretical predictions, confirming that gradient descent achieves linear convergence under regularized loss. Our results provide a tight geometric characterization of optimization dynamics and furnish provable convergence guarantees for deep linear models.
๐ Abstract
The optimization foundations of deep linear networks have received significant attention lately. However, due to the non-convexity and hierarchical structure, analyzing the regularized loss of deep linear networks remains a challenging task. In this work, we study the local geometric landscape of the regularized squared loss of deep linear networks, providing a deeper understanding of its optimization properties. Specifically, we characterize the critical point set and establish an error-bound property for all critical points under mild conditions. Notably, we identify the sufficient and necessary conditions under which the error bound holds. To support our theoretical findings, we conduct numerical experiments demonstrating that gradient descent exhibits linear convergence when optimizing the regularized loss of deep linear networks.