CoCoA: A Generalized Approach to Uncertainty Quantification by Integrating Confidence and Consistency of LLM Outputs

📅 2025-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing LLM uncertainty quantification (UQ) methods exhibit insufficient stability across certain tasks—sometimes even underperforming simple baselines. This work identifies LLMs as implicit probabilistic models and proposes the first unified UQ framework that jointly leverages token-level confidence scores and semantic consistency across multiple sampled outputs. Its core innovation is a tunable confidence–consistency coupling mechanism, which achieves robust calibration via adaptive weighted fusion of these complementary signals. Evaluated on question answering, abstractive summarization, and machine translation, our method consistently outperforms state-of-the-art UQ approaches. It yields significant improvements in both expected calibration error (ECE) and area under the risk-coverage curve (AURC), demonstrating the effectiveness and generalizability of jointly modeling confidence and consistency for reliable LLM uncertainty estimation.

Technology Category

Application Category

📝 Abstract
Uncertainty quantification (UQ) methods for Large Language Models (LLMs) encompasses a variety of approaches, with two major types being particularly prominent: information-based, which focus on model confidence expressed as token probabilities, and consistency-based, which assess the semantic relationship between multiple outputs generated using repeated sampling. Several recent methods have combined these two approaches and shown impressive performance in various applications. However, they sometimes fail to outperform much simpler baseline methods. Our investigation reveals distinctive characteristics of LLMs as probabilistic models, which help to explain why these UQ methods underperform in certain tasks. Based on these findings, we propose a new way of synthesizing model confidence and output consistency that leads to a family of efficient and robust UQ methods. We evaluate our approach across a variety of tasks such as question answering, abstractive summarization, and machine translation, demonstrating sizable improvements over state-of-the-art UQ approaches.
Problem

Research questions and friction points this paper is trying to address.

Uncertainty quantification in LLMs
Integrating confidence and consistency
Improving UQ method performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates confidence and consistency
New UQ methods for LLMs
Improves on state-of-the-art tasks