🤖 AI Summary
This work addresses the lack of analytical understanding of training dynamics in nonlinear bottleneck autoencoders under finite-dimensional, non-overcomplete settings. It introduces, for the first time in this regime, a mean-field approach to derive explicit learning dynamics for both encoder and decoder networks, and establishes a high-probability convergence guarantee between finite-width networks and their mean-field limit. The theoretical analysis demonstrates that, within a finite time horizon, the empirical risk trajectory closely approximates the mean-field risk with high probability and converges to its optimum. This study provides the first tractable characterization of mean-field training dynamics for nonlinear finite bottleneck autoencoders, offering new theoretical insights into their optimization behavior.
📝 Abstract
Autoencoders (AEs) learn low-dimensional representations by mapping data into a latent space while minimizing reconstruction error. Despite their empirical success, theoretical understanding remains limited and largely restricted to linear models or settings without a bottleneck. In this work, we study nonlinear AEs with a fixed finite-dimensional bottleneck in the mean-field (MF) regime. We derive explicit MF learning dynamics for both encoder and decoder, providing a tractable characterization of training in the nonlinear setting. We show that, over finite time horizons, the empirical risk of finite-width networks trained with stochastic gradient descent closely tracks the MF risk trajectory with high probability. At optimality, we further establish that the finite-width risk converges to the MF optimum, demonstrating that finite networks are sufficiently expressive to approximate the infinite-width solution.