🤖 AI Summary
Large language models (LLMs) frequently generate high-confidence hallucinations, yet existing detection methods lack fine-grained, generalizable, and plug-and-play solutions. To address this, we propose the first pre-trained, architecture-aware uncertainty quantification (UQ) auxiliary head, which deeply integrates LLM self-attention map features with Transformer structural priors and employs supervised learning to model per-statement confidence for statement-level hallucination detection. Our UQ head enables zero-shot transfer across diverse LLM families—including Mistral, Llama, and Gemma 2—as well as cross-lingual settings, achieving state-of-the-art performance in both in-domain and out-of-domain prompting scenarios. Crucially, it requires no task-specific fine-tuning or architectural modification to the base LLM. We release open-source code and pre-trained UQ heads compatible with major LLM series, facilitating immediate integration and reproducible research.
📝 Abstract
Large Language Models (LLMs) have the tendency to hallucinate, i.e., to sporadically generate false or fabricated information. This presents a major challenge, as hallucinations often appear highly convincing and users generally lack the tools to detect them. Uncertainty quantification (UQ) provides a framework for assessing the reliability of model outputs, aiding in the identification of potential hallucinations. In this work, we introduce pre-trained UQ heads: supervised auxiliary modules for LLMs that substantially enhance their ability to capture uncertainty compared to unsupervised UQ methods. Their strong performance stems from the powerful Transformer architecture in their design and informative features derived from LLM attention maps. Experimental evaluation shows that these heads are highly robust and achieve state-of-the-art performance in claim-level hallucination detection across both in-domain and out-of-domain prompts. Moreover, these modules demonstrate strong generalization to languages they were not explicitly trained on. We pre-train a collection of UQ heads for popular LLM series, including Mistral, Llama, and Gemma 2. We publicly release both the code and the pre-trained heads.