FIHA: Autonomous Hallucination Evaluation in Vision-Language Models with Davidson Scene Graphs

📅 2024-09-20
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Evaluating hallucinations in Large Vision-Language Models (LVLMs) is costly and lacks comprehensive coverage. Method: This paper proposes FIHA—a fine-grained, LLM-free, and annotation-free automatic evaluation framework. It introduces the first structured modeling of hallucination type dependencies using Davidson Scene Graphs (DSGs); designs a dual-path (image- and text-based) assessment mechanism; develops a three-dimensional hallucination detection model grounded in relations, attributes, and dependencies; and enhances cross-domain generalization by integrating MSCOCO and Foggy datasets. Contribution/Results: We release FIHA-v1—the first fine-grained, structured hallucination benchmark—and systematically uncover critical deficiencies of mainstream LVLMs in relational and attribute dependency reasoning. All code and data are open-sourced to advance low-cost, reproducible hallucination evaluation.

Technology Category

Application Category

📝 Abstract
The rapid development of Large Vision-Language Models (LVLMs) often comes with widespread hallucination issues, making cost-effective and comprehensive assessments increasingly vital. Current approaches mainly rely on costly annotations and are not comprehensive -- in terms of evaluating all aspects such as relations, attributes, and dependencies between aspects. Therefore, we introduce the FIHA (autonomous Fine-graIned Hallucination evAluation evaluation in LVLMs), which could access hallucination LVLMs in the LLM-free and annotation-free way and model the dependency between different types of hallucinations. FIHA can generate Q&A pairs on any image dataset at minimal cost, enabling hallucination assessment from both image and caption. Based on this approach, we introduce a benchmark called FIHA-v1, which consists of diverse questions on various images from MSCOCO and Foggy. Furthermore, we use the Davidson Scene Graph (DSG) to organize the structure among Q&A pairs, in which we can increase the reliability of the evaluation. We evaluate representative models using FIHA-v1, highlighting their limitations and challenges. We released our code and data.
Problem

Research questions and friction points this paper is trying to address.

Evaluating hallucination issues in LVLMs cost-effectively and comprehensively
Assessing relations, attributes, and dependencies between aspects in LVLMs
Providing an LLM-free and annotation-free hallucination evaluation method
Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous hallucination evaluation without LLM
Generates Q&A pairs at minimal cost
Uses Davidson Scene Graph for reliability
🔎 Similar Papers
No similar papers found.
B
Bowen Yan
University of Texas at Dallas, Richardson, United States
Z
Zhengsong Zhang
University of Texas at Dallas, Richardson, United States
Liqiang Jing
Liqiang Jing
University of Texas at Dallas
Multimedia AnalysisMultimodalNatural Language Processing
E
Eftekhar Hossain
University of Texas at Dallas, Richardson, United States
Xinya Du
Xinya Du
University of Texas at Dallas, CS; UIUC CS; Cornell University, CS
Large Language ModelsNatural Language ProcessingDeep LearningMultimodality