Fast Rate Information-theoretic Bounds on Generalization Errors

📅 2023-03-26
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the tightness of information-theoretic generalization error bounds with respect to sample size $n$, particularly the looseness of the individual-sample mutual information (ISMI) bound. To overcome the suboptimal $O(1/sqrt{n})$ convergence rate, we introduce, for the first time, an *excess risk assumption*, yielding a tight $O(1/n)$ fast-rate bound. Furthermore, we propose a novel generalization framework based on the $(eta,c)$-central condition, under which the mutual information term directly governs the convergence rate. We rigorously prove that this bound achieves the optimal $O(1/n)$ rate under standard assumptions. Empirical evaluation on canonical tasks—such as Gaussian mean estimation—demonstrates substantial improvements over existing information-theoretic bounds. The proposed framework thus bridges theoretical rigor with practical superiority, advancing both the tightness and applicability of information-theoretic generalization analysis.
📝 Abstract
The generalization error of a learning algorithm refers to the discrepancy between the loss of a learning algorithm on training data and that on unseen testing data. Various information-theoretic bounds on the generalization error have been derived in the literature, where the mutual information between the training data and the hypothesis (the output of the learning algorithm) plays an important role. Focusing on the individual sample mutual information bound by Bu et al., which itself is a tightened version of the first bound on the topic by Russo et al. and Xu et al., this paper investigates the tightness of these bounds, in terms of the dependence of their convergence rates on the sample size $n$. It has been recognized that these bounds are in general not tight, readily verified for the exemplary quadratic Gaussian mean estimation problem, where the individual sample mutual information bound scales as $O(sqrt{1/n})$ while the true generalization error scales as $O(1/n)$. The first contribution of this paper is to show that the same bound can in fact be asymptotically tight if an appropriate assumption is made. In particular, we show that the fast rate can be recovered when the assumption is made on the excess risk instead of the loss function, which was usually done in existing literature. A theoretical justification is given for this choice. The second contribution of the paper is a new set of generalization error bounds based on the $(eta, c)$-central condition, a condition relatively easy to verify and has the property that the mutual information term directly determines the convergence rate of the bound. Several analytical and numerical examples are given to show the effectiveness of these bounds.
Problem

Research questions and friction points this paper is trying to address.

Investigates tightness of generalization error bounds
Shows fast rate recovery under excess risk assumption
Proposes new bounds using (η, c)-central condition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses individual sample mutual information bounds
Assumes excess risk for fast rate recovery
Introduces bounds based on (η, c)-central condition
🔎 Similar Papers
No similar papers found.