Non-vacuous Generalization Bounds for Deep Neural Networks without any modification to the trained models

📅 2025-03-10

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the theoretical bottleneck of vacuous generalization bounds for deep neural networks—bounds that are overly loose and often require model modifications. We propose a non-vacuous, test-error upper bound that depends solely on the training set and requires no architectural or training-time alterations (e.g., compression, quantization, or retraining). Methodologically, we unify the PAC-Bayes and mutual information frameworks, constructing a tight bound using empirical risk and statistical properties of the training data. Our key contributions are threefold: (i) the first non-vacuous generalization guarantee for unmodified, large-scale ImageNet-pretrained models—including ResNet and ViT; (ii) two novel generalization bounds that jointly ensure theoretical rigor and computational tractability; and (iii) empirical validation demonstrating bound values significantly below 1, establishing the largest-scale non-vacuous generalization guarantee to date.

Technology Category

Application Category

📝 Abstract

Deep neural network (NN) with millions or billions of parameters can perform really well on unseen data, after being trained from a finite training set. Various prior theories have been developed to explain such excellent ability of NNs, but do not provide a meaningful bound on the test error. Some recent theories, based on PAC-Bayes and mutual information, are non-vacuous and hence show a great potential to explain the excellent performance of NNs. However, they often require a stringent assumption and extensive modification (e.g. compression, quantization) to the trained model of interest. Therefore, those prior theories provide a guarantee for the modified versions only. In this paper, we propose two novel bounds on the test error of a model. Our bounds uses the training set only and require no modification to the model. Those bounds are verified on a large class of modern NNs, pretrained by Pytorch on the ImageNet dataset, and are non-vacuous. To the best of our knowledge, these are the first non-vacuous bounds at this large scale, without any modification to the pretrained models.

Problem

Research questions and friction points this paper is trying to address.

Establishes non-vacuous generalization bounds for deep neural networks.

Requires no modification to trained models for error bounds.

Validates bounds on large-scale pretrained models like ImageNet.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-vacuous generalization bounds without model modification

Uses training set only for error bounds

Verified on large-scale pretrained neural networks

🔎 Similar Papers

On Rademacher Complexity-based Generalization Bounds for Deep Learning