🤖 AI Summary
This work addresses the overfitting issue of conventional Bayesian predictors in noisy binary classification, where insufficient regularization leads to significant excess risk. The authors propose a PAC-Bayes learning rule that controls regularization strength by balancing the training error of a randomized posterior predictor against its KL divergence from a prior. The key contribution lies in elucidating how the trade-off parameter λ governs generalization: when λ grows with the sample size (λ ≫ 1), employing a sample-size-dependent prior guarantees that the excess risk converges to zero uniformly under agnostic label noise. This result extends discrete-prior analyses to the continuous Bayesian setting and, through a refined two-part coding MDL perspective, quantitatively delineates the boundary between under- and over-regularization, thereby effectively mitigating the overfitting inherent in traditional Bayesian prediction.
📝 Abstract
We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ\gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $λ$, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.