Selective Labeling with False Discovery Rate Control

📅 2025-10-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large-scale data labeling relies heavily on expert annotators, incurring prohibitive costs; while AI-based automatic labeling is cost-effective, its error rate is uncontrolled. Existing selective labeling approaches lack theoretical guarantees, making it difficult to ensure the quality of AI-generated labels on selected subsets. This paper proposes Conformal Labeling, the first method to incorporate false discovery rate (FDR) control into selective labeling, providing provable quality guarantees for AI predictions—specifically, ensuring that the proportion of erroneous labels among selected predictions remains below a user-specified threshold with statistical rigor. Our approach constructs conformal p-values from calibration data and employs a data-dependent threshold to retain only high-confidence predictions. Evaluated on image classification, text tagging, and large language model question-answering tasks, Conformal Labeling achieves exact FDR control and high selection power, substantially enhancing the reliability and practical utility of AI-assisted labeling.

Technology Category

Application Category

📝 Abstract
Obtaining high-quality labels for large datasets is expensive, requiring massive annotations from human experts. While AI models offer a cost-effective alternative by predicting labels, their label quality is compromised by the unavoidable labeling errors. Existing methods mitigate this issue through selective labeling, where AI labels a subset and human labels the remainder. However, these methods lack theoretical guarantees on the quality of AI-assigned labels, often resulting in unacceptably high labeling error within the AI-labeled subset. To address this, we introduce extbf{Conformal Labeling}, a novel method to identify instances where AI predictions can be provably trusted. This is achieved by controlling the false discovery rate (FDR), the proportion of incorrect labels within the selected subset. In particular, we construct a conformal $p$-value for each test instance by comparing AI models' predicted confidence to those of calibration instances mislabeled by AI models. Then, we select test instances whose $p$-values are below a data-dependent threshold, certifying AI models' predictions as trustworthy. We provide theoretical guarantees that Conformal Labeling controls the FDR below the nominal level, ensuring that a predefined fraction of AI-assigned labels is correct on average. Extensive experiments demonstrate that our method achieves tight FDR control with high power across various tasks, including image and text labeling, and LLM QA.
Problem

Research questions and friction points this paper is trying to address.

Controlling false discovery rate in selective AI labeling systems
Providing theoretical guarantees for AI-assigned label quality
Identifying trustworthy AI predictions using conformal p-values
Innovation

Methods, ideas, or system contributions that make the work stand out.

Controls false discovery rate for AI labels
Uses conformal p-values from confidence comparisons
Selects trustworthy predictions via data-dependent threshold
🔎 Similar Papers
No similar papers found.
H
Huipeng Huang
Department of Statistics and Data Science, Southern University of Science and Technology
W
Wenbo Liao
Department of Statistics and Data Science, Southern University of Science and Technology
H
Huajun Xi
Department of Statistics and Data Science, Southern University of Science and Technology
H
Hao Zeng
Department of Statistics and Data Science, Southern University of Science and Technology
Mengchen Zhao
Mengchen Zhao
South China University of Technology
Reinforcement LearningMulti-Agent SystemsGenerative Decision MakingLLM Agents
Hongxin Wei
Hongxin Wei
Southern University of Science and Technology (SUSTech)
Reliable Machine LearningUncertainty EstimationStatistics