AggTruth: Contextual Hallucination Detection using Aggregated Attention Scores in LLMs

📅 2025-06-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address persistent context hallucinations in retrieval-augmented generation (RAG) despite the use of large language models (LLMs), this paper proposes AggTruth—a novel online hallucination detection method grounded in internal attention mechanisms. AggTruth aggregates multi-head self-attention score distributions and systematically investigates the impact of various aggregation strategies (e.g., mean, max, entropy-weighted) and attention head selection on detection performance; it further enhances discriminative capability via feature selection. Experiments across multiple state-of-the-art LLMs—including Llama-2/3, Qwen, and Phi-3—demonstrate that AggTruth achieves strong robustness both cross-task and within-task, significantly outperforming existing SOTA detectors. Crucially, it requires no additional training or external annotations, operating fully online with minimal computational overhead.

Technology Category

Application Category

📝 Abstract
In real-world applications, Large Language Models (LLMs) often hallucinate, even in Retrieval-Augmented Generation (RAG) settings, which poses a significant challenge to their deployment. In this paper, we introduce AggTruth, a method for online detection of contextual hallucinations by analyzing the distribution of internal attention scores in the provided context (passage). Specifically, we propose four different variants of the method, each varying in the aggregation technique used to calculate attention scores. Across all LLMs examined, AggTruth demonstrated stable performance in both same-task and cross-task setups, outperforming the current SOTA in multiple scenarios. Furthermore, we conducted an in-depth analysis of feature selection techniques and examined how the number of selected attention heads impacts detection performance, demonstrating that careful selection of heads is essential to achieve optimal results.
Problem

Research questions and friction points this paper is trying to address.

Detects contextual hallucinations in LLMs using attention scores
Evaluates aggregation techniques for attention score analysis
Analyzes feature selection impact on hallucination detection performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Detects hallucinations via aggregated attention scores
Proposes four variants of aggregation techniques
Optimizes performance with attention head selection
🔎 Similar Papers
No similar papers found.
P
Piotr Matys
Department of Artificial Intelligence, Wroclaw Tech, Wrocław, Poland
J
Jan Eliasz
Department of Artificial Intelligence, Wroclaw Tech, Wrocław, Poland
K
Konrad Kiełczyński
Department of Artificial Intelligence, Wroclaw Tech, Wrocław, Poland
M
Mikołaj Langner
Department of Artificial Intelligence, Wroclaw Tech, Wrocław, Poland
T
Teddy Ferdinan
Department of Artificial Intelligence, Wroclaw Tech, Wrocław, Poland
Jan Kocoń
Jan Kocoń
Department of Artificial Intelligence, Wroclaw University of Science and Technology
Artificial IntelligenceNatural Language ProcessingLarge Language ModelsTransformersPersonalized NLP
Przemysław Kazienko
Przemysław Kazienko
Politechnika Wrocławska
NLPaffective computingwearablesmachine learningsocial networks