A DPI-PAC-Bayesian Framework for Generalization Bounds

📅 2025-07-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the excessive looseness of PAC-Bayesian generalization bounds in supervised learning. Methodologically, it introduces a unified analytical framework integrating the Data Processing Inequality (DPI) with PAC-Bayes theory—specifically, the first incorporation of DPI into PAC-Bayes derivations—to explicitly quantify information loss, measured by KL divergence, between prior and posterior distributions. This eliminates redundant slack terms arising from independence assumptions in classical Occam’s Razor bounds. Leveraging change-of-measure techniques, the framework extends KL-based bounds to the broader *f*-divergence family—including Rényi, Hellinger-*p*, and χ² divergences—yielding tight, closed-form generalization error upper bounds. Theoretically, the new bounds recover classical results exactly under uniform priors and are provably strictly tighter in all other cases. The framework establishes an intrinsic unification between PAC-Bayes theory and information-theoretic generalization analysis, significantly enhancing both the precision and applicability of theoretical guarantees.

Technology Category

Application Category

📝 Abstract
We develop a unified Data Processing Inequality PAC-Bayesian framework -- abbreviated DPI-PAC-Bayesian -- for deriving generalization error bounds in the supervised learning setting. By embedding the Data Processing Inequality (DPI) into the change-of-measure technique, we obtain explicit bounds on the binary Kullback-Leibler generalization gap for both Rényi divergence and any $f$-divergence measured between a data-independent prior distribution and an algorithm-dependent posterior distribution. We present three bounds derived under our framework using Rényi, Hellinger (p) and Chi-Squared divergences. Additionally, our framework also demonstrates a close connection with other well-known bounds. When the prior distribution is chosen to be uniform, our bounds recover the classical Occam's Razor bound and, crucially, eliminate the extraneous (log(2sqrt{n})/n) slack present in the PAC-Bayes bound, thereby achieving tighter results. The framework thus bridges data-processing and PAC-Bayesian perspectives, providing a flexible, information-theoretic tool to construct generalization guarantees.
Problem

Research questions and friction points this paper is trying to address.

Develops framework for generalization error bounds in supervised learning
Derives bounds using Rényi and f-divergences between prior and posterior
Connects data-processing and PAC-Bayesian perspectives for tighter results
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified DPI-PAC-Bayesian framework for generalization bounds
Bounds using Rényi, Hellinger, and Chi-Squared divergences
Tighter results by eliminating extraneous slack terms
🔎 Similar Papers
No similar papers found.
M
Muhan Guan
Department of EEE, University of Melbourne, Parkville, Victoria, Australia
F
Farhad Farokhi
Department of EEE, University of Melbourne, Parkville, Victoria, Australia
Jingge Zhu
Jingge Zhu
University of Melbourne
Information TheoryCommunication SystemsStatistical Learning Theory