Sample Compression Unleashed: New Generalization Bounds for Real Valued Losses

📅 2024-09-26
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Classical sample compression generalization theory has long been restricted to bounded zero-one loss, limiting its applicability to modern real-valued, unbounded loss settings. Method: We propose a general compression paradigm based on the Pick-To-Learn (P2L) meta-algorithm, which transforms any learning algorithm into a compressible predictor and yields model-agnostic, theoretically rigorous upper bounds on generalization error. Contribution/Results: Our bounds dispense with the conventional assumption of loss function boundedness, thereby significantly broadening applicability to contemporary models—including deep neural networks and random forests. Empirical evaluation across multiple neural network architectures and random forests demonstrates both tightness and computability of the derived bounds. To our knowledge, this work provides the first verifiable, computable generalization error bound for real-valued loss functions within the sample compression framework.

Technology Category

Application Category

📝 Abstract
The sample compression theory provides generalization guarantees for predictors that can be fully defined using a subset of the training dataset and a (short) message string, generally defined as a binary sequence. Previous works provided generalization bounds for the zero-one loss, which is restrictive notably when applied to deep learning approaches. In this paper, we present a general framework for deriving new sample compression bounds that hold for real-valued unbounded losses. Using the Pick-To-Learn (P2L) meta-algorithm, which transforms the training method of any machine-learning predictor to yield sample-compressed predictors, we empirically demonstrate the tightness of the bounds and their versatility by evaluating them on random forests and multiple types of neural networks.
Problem

Research questions and friction points this paper is trying to address.

Extends sample compression theory to real-valued unbounded losses.
Provides generalization bounds for deep learning and random forests.
Uses Pick-To-Learn meta-algorithm to validate tightness and versatility.
Innovation

Methods, ideas, or system contributions that make the work stand out.

General framework for real-valued loss bounds
Pick-To-Learn meta-algorithm for sample compression
Empirical validation on neural networks and forests
🔎 Similar Papers
No similar papers found.