🤖 AI Summary
To address the challenge of hallucination detection in RAG systems—where existing LLM-based methods rely heavily on large-scale annotated data and thus suffer from poor industrial deployability—this paper proposes a lightweight meta-model framework. It is the first to synergistically integrate linear classifiers, PCA/UMAP dimensionality reduction, dynamic attention-head modeling, and internal representation decoding, tailored for both Lookback Lens and probing-based hallucination detection paradigms. Evaluated on standard RAG benchmarks, the method achieves >92% accuracy using only 250 labeled samples—matching the performance of strong closed-source LLM baselines while reducing annotation requirements by over 90%. Its core contribution lies in enabling highly robust hallucination assessment at minimal labeling cost, thereby substantially alleviating the data bottleneck inherent in supervised detection approaches and facilitating scalable, real-world deployment of hallucination detection.
📝 Abstract
Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) systems are increasingly deployed in industry applications, yet their reliability remains hampered by challenges in detecting hallucinations. While supervised state-of-the-art (SOTA) methods that leverage LLM hidden states -- such as activation tracing and representation analysis -- show promise, their dependence on extensively annotated datasets limits scalability in real-world applications. This paper addresses the critical bottleneck of data annotation by investigating the feasibility of reducing training data requirements for two SOTA hallucination detection frameworks: Lookback Lens, which analyzes attention head dynamics, and probing-based approaches, which decode internal model representations. We propose a methodology combining efficient classification algorithms with dimensionality reduction techniques to minimize sample size demands while maintaining competitive performance. Evaluations on standardized question-answering RAG benchmarks show that our approach achieves performance comparable to strong proprietary LLM-based baselines with only 250 training samples. These results highlight the potential of lightweight, data-efficient paradigms for industrial deployment, particularly in annotation-constrained scenarios.