Calibrating Verbalized Confidence with Self-Generated Distractors

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently exhibit miscalibrated confidence scores—overconfident on incorrect outputs and underconfident on correct ones—undermining reliability and safety. To address this, we propose DINCO, a lightweight, inference-time confidence calibration method. DINCO constructs a normalized confidence score by self-generating distractors and integrating generator-verifier disagreement modeling with multi-round self-consistency signals. Its key innovation lies in dynamically correcting overconfidence via distractor-augmented normalization, requiring no additional training or external annotations. Experiments demonstrate that DINCO achieves superior calibration performance using only 10 inference calls—outperforming baseline self-consistency methods requiring 100 calls—and improves Expected Calibration Error (ECE) by up to 37%. Moreover, DINCO yields smoother, more discriminative confidence estimates while significantly reducing deployment overhead.

Technology Category

Application Category

📝 Abstract
Calibrated confidence estimates are necessary for large language model (LLM) outputs to be trusted by human users. While LLMs can express their confidence in human-interpretable ways, verbalized LLM-generated confidence scores have empirically been found to be miscalibrated, reporting high confidence on instances with low accuracy and thereby harming trust and safety. We hypothesize that this overconfidence often stems from a given LLM's heightened suggestibility when faced with claims that it encodes little information about; we empirically validate this hypothesis, finding more suggestibility on lower-accuracy claims. Building on this finding, we introduce Distractor-Normalized Coherence (DINCO), which estimates and accounts for an LLM's suggestibility bias by having the model verbalize its confidence independently across several self-generated distractors (i.e. alternative claims), and normalizes by the total verbalized confidence. To further improve calibration, we leverage generator-validator disagreement, augmenting normalized validator confidence with a consistency-based estimate of generator confidence. Here, we frame the popular approach of self-consistency as leveraging coherence across sampled generations, and normalized verbalized confidence as leveraging coherence across validations on incompatible claims, allowing us to integrate these complementary dimensions of coherence into DINCO. Moreover, our analysis shows that DINCO provides less saturated -- and therefore more usable -- confidence estimates, and that further sampling alone cannot close the gap between DINCO and baselines, with DINCO at 10 inference calls outperforming self-consistency at 100.
Problem

Research questions and friction points this paper is trying to address.

Calibrating verbalized confidence scores from large language models
Addressing overconfidence bias through self-generated distractor claims
Improving trustworthiness via normalized coherence and generator-validator disagreement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses self-generated distractors to normalize confidence scores
Integrates generator-validator disagreement for calibration
Combines coherence across generations and validations
🔎 Similar Papers
No similar papers found.