Leveraging NTPs for Efficient Hallucination Detection in VLMs

📅 2025-09-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Vision-language models (VLMs) frequently suffer from hallucination—generating text inconsistent with image content—undermining their reliability. To address this, we propose a lightweight, real-time hallucination detection method grounded in next-token probability (NTP). First, we systematically validate NTP as an effective internal uncertainty proxy within VLMs. Second, we construct a multi-source feature discriminator by fusing linguistic NTP scores, image-guided VLM prediction confidence, and model-generated hallucination likelihood scores. Third, we train a traditional machine learning classifier (e.g., XGBoost) on only 1,400 human-annotated samples, achieving detection accuracy comparable to strong VLM-based baselines; ensemble integration further improves performance. Our approach drastically reduces computational overhead and inference latency while maintaining high fidelity. It thus provides an efficient, practical solution for trustworthy VLM deployment in real-world applications.

Technology Category

Application Category

📝 Abstract
Hallucinations of vision-language models (VLMs), which are misalignments between visual content and generated text, undermine the reliability of VLMs. One common approach for detecting them employs the same VLM, or a different one, to assess generated outputs. This process is computationally intensive and increases model latency. In this paper, we explore an efficient on-the-fly method for hallucination detection by training traditional ML models over signals based on the VLM's next-token probabilities (NTPs). NTPs provide a direct quantification of model uncertainty. We hypothesize that high uncertainty (i.e., a low NTP value) is strongly associated with hallucinations. To test this, we introduce a dataset of 1,400 human-annotated statements derived from VLM-generated content, each labeled as hallucinated or not, and use it to test our NTP-based lightweight method. Our results demonstrate that NTP-based features are valuable predictors of hallucinations, enabling fast and simple ML models to achieve performance comparable to that of strong VLMs. Furthermore, augmenting these NTPs with linguistic NTPs, computed by feeding only the generated text back into the VLM, enhances hallucination detection performance. Finally, integrating hallucination prediction scores from VLMs into the NTP-based models led to better performance than using either VLMs or NTPs alone. We hope this study paves the way for simple, lightweight solutions that enhance the reliability of VLMs.
Problem

Research questions and friction points this paper is trying to address.

Detecting hallucinations in vision-language models efficiently
Reducing computational cost of hallucination detection methods
Using next-token probabilities to quantify model uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses next-token probabilities to quantify model uncertainty
Trains lightweight ML models on NTP features for detection
Augments visual NTPs with linguistic NTPs for better performance
🔎 Similar Papers
No similar papers found.
O
Ofir Azachi
Department of Data and Decision Science, Technion - Israel Institute of Technology
K
Kfir Eliyahu
Department of Data and Decision Science, Technion - Israel Institute of Technology
E
Eyal El Ani
Department of Data and Decision Science, Technion - Israel Institute of Technology
R
Rom Himelstein
Department of Data and Decision Science, Technion - Israel Institute of Technology
Roi Reichart
Roi Reichart
Professor of Artificial Intelligence, Technion - Israel Institute of Technology
natural language processingmachine learningartificial intelligencehealth AIAI for Science
Yuval Pinter
Yuval Pinter
Ben-Gurion University of the Negev
Natural Language ProcessingMachine LearningInformation RetrievalLinguistics
Nitay Calderon
Nitay Calderon
Technion - Israel Institute of Technology
Counterfactual GenerationLLM-as-a-JudgeConcept-Based ExplainabilityDomain Adaptation