🤖 AI Summary
This work addresses the challenge of applying multicalibration in weakly supervised learning settings—such as positive-unlabeled or unlabeled-unlabeled classification—where the absence of clean labels renders traditional multicalibration methods inapplicable. The paper presents the first extension of multicalibration to such weak supervision scenarios through a unified framework that models label noise via a corruption matrix and rewrites the risk to estimate multicalibration error. By introducing calibration constraints based on witness functions, the authors develop WLMC, a moment-based estimator with finite-sample guarantees and a general-purpose post-processing algorithm. Empirical evaluations demonstrate that the proposed method substantially improves the reliability of predicted probabilities and achieves strong multicalibration performance across diverse weakly supervised settings.
📝 Abstract
Multicalibration requires predicted scores to agree with label probabilities across rich families of subgroups and score-dependent tests, but existing methods require clean input-label pairs for evaluation and post-processing. This assumption fails in weakly supervised learning (WSL) regimes -- including positive-unlabeled, unlabeled-unlabeled, and positive-confidence learning -- where clean labels are costly or unavailable even though reliable uncertainty estimates may be crucial. We address this gap by developing estimators of multicalibration error and post-hoc correction methods for WSL settings in which clean input-label pairs are unavailable. We propose a unified framework for estimating and correcting multicalibration under weak supervision by combining contamination-matrix risk rewrites with witness-based calibration constraints, yielding corrected multicalibration moments with finite-sample guarantees. We further propose weak-label multicalibration boost (WLMC), a generic post-hoc recalibration algorithm under weak supervision. Finally, we conduct experiments across multiple weak-supervision settings to evaluate multicalibration behavior and offer empirical insight into uncertainty estimation under weak supervision.