🤖 AI Summary
This work addresses the theoretical bottleneck of vacuous generalization bounds for deep neural networks—bounds that are overly loose and often require model modifications. We propose a non-vacuous, test-error upper bound that depends solely on the training set and requires no architectural or training-time alterations (e.g., compression, quantization, or retraining). Methodologically, we unify the PAC-Bayes and mutual information frameworks, constructing a tight bound using empirical risk and statistical properties of the training data. Our key contributions are threefold: (i) the first non-vacuous generalization guarantee for unmodified, large-scale ImageNet-pretrained models—including ResNet and ViT; (ii) two novel generalization bounds that jointly ensure theoretical rigor and computational tractability; and (iii) empirical validation demonstrating bound values significantly below 1, establishing the largest-scale non-vacuous generalization guarantee to date.
📝 Abstract
Deep neural network (NN) with millions or billions of parameters can perform really well on unseen data, after being trained from a finite training set. Various prior theories have been developed to explain such excellent ability of NNs, but do not provide a meaningful bound on the test error. Some recent theories, based on PAC-Bayes and mutual information, are non-vacuous and hence show a great potential to explain the excellent performance of NNs. However, they often require a stringent assumption and extensive modification (e.g. compression, quantization) to the trained model of interest. Therefore, those prior theories provide a guarantee for the modified versions only. In this paper, we propose two novel bounds on the test error of a model. Our bounds uses the training set only and require no modification to the model. Those bounds are verified on a large class of modern NNs, pretrained by Pytorch on the ImageNet dataset, and are non-vacuous. To the best of our knowledge, these are the first non-vacuous bounds at this large scale, without any modification to the pretrained models.