Reliable Programmatic Weak Supervision with Confidence Intervals for Label Probabilities

📅 2025-08-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In programmatic weak supervision, existing methods lack mechanisms to quantify the reliability of label probability predictions. This work addresses this gap by modeling weak labeling functions’ arbitrary behaviors and types via uncertainty sets of distributions—enabling, for the first time, confidence intervals for predicted label probabilities. We employ robust optimization to fuse heterogeneous, weak supervision signals whose dependency structure is unknown, yielding interpretable, quantified assessments of prediction uncertainty. Experiments across multiple benchmark datasets demonstrate significant improvements in both predictive performance and calibration, validating the practical utility and generalizability of the proposed confidence intervals. Our core contributions are threefold: (1) the first framework for estimating confidence intervals of label probabilities under weak supervision; (2) robust fusion of heterogeneous weak signals without assuming known dependency structures; and (3) a theoretically rigorous approach that simultaneously ensures statistical soundness and empirical prediction reliability.

Technology Category

Application Category

📝 Abstract
The accurate labeling of datasets is often both costly and time-consuming. Given an unlabeled dataset, programmatic weak supervision obtains probabilistic predictions for the labels by leveraging multiple weak labeling functions (LFs) that provide rough guesses for labels. Weak LFs commonly provide guesses with assorted types and unknown interdependences that can result in unreliable predictions. Furthermore, existing techniques for programmatic weak supervision cannot provide assessments for the reliability of the probabilistic predictions for labels. This paper presents a methodology for programmatic weak supervision that can provide confidence intervals for label probabilities and obtain more reliable predictions. In particular, the methods proposed use uncertainty sets of distributions that encapsulate the information provided by LFs with unrestricted behavior and typology. Experiments on multiple benchmark datasets show the improvement of the presented methods over the state-of-the-art and the practicality of the confidence intervals presented.
Problem

Research questions and friction points this paper is trying to address.

Estimating label probabilities with confidence intervals
Handling unreliable weak labeling functions
Improving weak supervision prediction reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Confidence intervals for label probabilities
Uncertainty sets of distributions
Leveraging unrestricted weak labeling functions
🔎 Similar Papers
No similar papers found.