A Head to Predict and a Head to Question: Pre-trained Uncertainty Quantification Heads for Hallucination Detection in LLM Outputs

📅 2025-05-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) frequently generate high-confidence hallucinations, yet existing detection methods lack fine-grained, generalizable, and plug-and-play solutions. To address this, we propose the first pre-trained, architecture-aware uncertainty quantification (UQ) auxiliary head, which deeply integrates LLM self-attention map features with Transformer structural priors and employs supervised learning to model per-statement confidence for statement-level hallucination detection. Our UQ head enables zero-shot transfer across diverse LLM families—including Mistral, Llama, and Gemma 2—as well as cross-lingual settings, achieving state-of-the-art performance in both in-domain and out-of-domain prompting scenarios. Crucially, it requires no task-specific fine-tuning or architectural modification to the base LLM. We release open-source code and pre-trained UQ heads compatible with major LLM series, facilitating immediate integration and reproducible research.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly convincing and users generally lack the tools to detect them. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potential hallucinations. In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. Their strong performance stems from the powerful Transformer architecture in their design and informative features derived from LLM attention maps. Experimental evaluation shows that these heads are highly robust and achieve state-of-the-art performance in claim-level hallucination detection across both in-domain and out-of-domain prompts. Moreover, these modules demonstrate strong generalization to languages they were not explicitly trained on. We pre-train a collection of UQ heads for popular LLM series, including Mistral, Llama, and Gemma 2. We publicly release both the code and the pre-trained heads.
Problem

Research questions and friction points this paper is trying to address.

Detecting false information in LLM outputs using uncertainty quantification
Enhancing hallucination detection with pre-trained UQ heads
Improving robustness and generalization in uncertainty assessment across languages
Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained UQ heads enhance uncertainty detection
Utilize Transformer architecture and attention maps
Robust performance across domains and languages
🔎 Similar Papers
No similar papers found.