Visual hallucination detection in large vision-language models via evidential conflict

📅 2025-06-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large Vision-Language Models (LVLMs) suffer from visual hallucinations—semantic inconsistencies between images and text—largely due to deficiencies in high-level reasoning. Method: We propose PRE-HAL, the first benchmark jointly evaluating perceptual fidelity and reasoning capability, and introduce a novel uncertainty detection method grounded in Dempster-Shafer theory. This method uniquely incorporates an evidence conflict mechanism, employing lightweight mass functions to model reasoning contradictions among high-order features, enabling fine-grained identification of hallucinations across instance-, scene-, and relation-level semantics. Contribution/Results: Experiments on LLaVA-v1.5, mPLUG-Owl2, and mPLUG-Owl3 demonstrate that our approach achieves an average 4–10% improvement in AUROC over five state-of-the-art uncertainty baselines, significantly enhancing robustness and generalization for hallucination detection under complex semantic conditions.

Technology Category

Application Category

📝 Abstract
Despite the remarkable multimodal capabilities of Large Vision-Language Models (LVLMs), discrepancies often occur between visual inputs and textual outputs--a phenomenon we term visual hallucination. This critical reliability gap poses substantial risks in safety-critical Artificial Intelligence (AI) applications, necessitating a comprehensive evaluation benchmark and effective detection methods. Firstly, we observe that existing visual-centric hallucination benchmarks mainly assess LVLMs from a perception perspective, overlooking hallucinations arising from advanced reasoning capabilities. We develop the Perception-Reasoning Evaluation Hallucination (PRE-HAL) dataset, which enables the systematic evaluation of both perception and reasoning capabilities of LVLMs across multiple visual semantics, such as instances, scenes, and relations. Comprehensive evaluation with this new benchmark exposed more visual vulnerabilities, particularly in the more challenging task of relation reasoning. To address this issue, we propose, to the best of our knowledge, the first Dempster-Shafer theory (DST)-based visual hallucination detection method for LVLMs through uncertainty estimation. This method aims to efficiently capture the degree of conflict in high-level features at the model inference phase. Specifically, our approach employs simple mass functions to mitigate the computational complexity of evidence combination on power sets. We conduct an extensive evaluation of state-of-the-art LVLMs, LLaVA-v1.5, mPLUG-Owl2 and mPLUG-Owl3, with the new PRE-HAL benchmark. Experimental results indicate that our method outperforms five baseline uncertainty metrics, achieving average AUROC improvements of 4%, 10%, and 7% across three LVLMs. Our code is available at https://github.com/HT86159/Evidential-Conflict.
Problem

Research questions and friction points this paper is trying to address.

Detects visual hallucinations in vision-language models via evidential conflict
Evaluates perception and reasoning gaps in LVLMs using PRE-HAL dataset
Proposes DST-based method to improve hallucination detection accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed PRE-HAL dataset for perception-reasoning evaluation
Proposed DST-based hallucination detection via uncertainty estimation
Employed simple mass functions to reduce computational complexity
🔎 Similar Papers
No similar papers found.
T
Tao Huang
Beijing Key Lab of Traffic Data Mining and Embodied Intelligence, Beijing, China; State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China; School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
Z
Zhekun Liu
Beijing Key Lab of Traffic Data Mining and Embodied Intelligence, Beijing, China; State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China; School of Computer Science and Technology, Beijing Jiaotong University, Beijing, China
R
Rui Wang
State Key Laboratory of Advanced Rail Autonomous Operation, Beijing, China; School of Automation and Intelligence, Beijing Jiaotong University, Beijing, China
Y
Yang Zhang
School of Electronic and Information Engineering, Beijing Jiaotong University, Beijing, China
Liping Jing
Liping Jing
Beijing Jiaotong University
Machine LearningData Mining