🤖 AI Summary
This work addresses the excessive looseness of PAC-Bayesian generalization bounds in supervised learning. Methodologically, it introduces a unified analytical framework integrating the Data Processing Inequality (DPI) with PAC-Bayes theory—specifically, the first incorporation of DPI into PAC-Bayes derivations—to explicitly quantify information loss, measured by KL divergence, between prior and posterior distributions. This eliminates redundant slack terms arising from independence assumptions in classical Occam’s Razor bounds. Leveraging change-of-measure techniques, the framework extends KL-based bounds to the broader *f*-divergence family—including Rényi, Hellinger-*p*, and χ² divergences—yielding tight, closed-form generalization error upper bounds. Theoretically, the new bounds recover classical results exactly under uniform priors and are provably strictly tighter in all other cases. The framework establishes an intrinsic unification between PAC-Bayes theory and information-theoretic generalization analysis, significantly enhancing both the precision and applicability of theoretical guarantees.
📝 Abstract
We develop a unified Data Processing Inequality PAC-Bayesian framework -- abbreviated DPI-PAC-Bayesian -- for deriving generalization error bounds in the supervised learning setting. By embedding the Data Processing Inequality (DPI) into the change-of-measure technique, we obtain explicit bounds on the binary Kullback-Leibler generalization gap for both Rényi divergence and any $f$-divergence measured between a data-independent prior distribution and an algorithm-dependent posterior distribution. We present three bounds derived under our framework using Rényi, Hellinger (p) and Chi-Squared divergences. Additionally, our framework also demonstrates a close connection with other well-known bounds. When the prior distribution is chosen to be uniform, our bounds recover the classical Occam's Razor bound and, crucially, eliminate the extraneous (log(2sqrt{n})/n) slack present in the PAC-Bayes bound, thereby achieving tighter results. The framework thus bridges data-processing and PAC-Bayesian perspectives, providing a flexible, information-theoretic tool to construct generalization guarantees.