🤖 AI Summary
This work addresses the inefficiency and overfitting challenges posed by the massive parameter counts of modern deep learning models. Building on the hypothesis that trained weights reside on a low-dimensional smooth manifold, the authors propose replacing high-dimensional weight tensors with compact, trainable latent vectors. A dedicated mapping network and associated loss function are introduced to enable highly efficient reconstruction of the original weights from an extremely low-dimensional latent space. Theoretical analysis and extensive experiments demonstrate, for the first time, the feasibility of this approach: across tasks including image classification and deepfake detection, models reconstructed from latent representations using only 0.5% of the original parameters (achieving a 99.5% compression ratio, or ~500× reduction) not only match but often surpass the performance of the original full-parameter models, substantially mitigating overfitting.
📝 Abstract
The escalating parameter counts in modern deep learning models pose a fundamental challenge to efficient training and resolution of overfitting. We address this by introducing the \emph{Mapping Networks} which replace the high dimensional weight space by a compact, trainable latent vector based on the hypothesis that the trained parameters of large networks reside on smooth, low-dimensional manifolds. Henceforth, the Mapping Theorem enforced by a dedicated Mapping Loss, shows the existence of a mapping from this latent space to the target weight space both theoretically and in practice. Mapping Networks significantly reduce overfitting and achieve comparable to better performance than target network across complex vision and sequence tasks, including Image Classification, Deepfake Detection etc, with $\mathbf{99.5\%}$, i.e., around $500\times$ reduction in trainable parameters.