Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

📅 2024-08-18
🏛️ arXiv.org
📈 Citations: 6
Influential: 0
📄 PDF
🤖 AI Summary
Relationship hallucination—erroneous generation of complex semantic relationships among objects—has been long overlooked in multimodal large language models (MLLMs); existing evaluation benchmarks suffer from ambiguous definitions, severe data bias, and a lack of dedicated mitigation strategies. Method: We propose Reefknot, the first comprehensive benchmark for relationship hallucination, comprising 21K real-world samples. It formally defines the problem through dual perceptual and cognitive perspectives and constructs a relation-centric corpus derived from Visual Genome. Our method introduces relational triple modeling, scene graph analysis, and confidence calibration—the first dedicated mitigation framework for this issue. Results: Experiments on Reefknot and two external benchmarks demonstrate an average 9.75% reduction in relationship hallucination rates, exposing systematic relational reasoning deficiencies across mainstream MLLMs. Reefknot establishes a reproducible, fine-grained evaluation paradigm for trustworthy multimodal intelligence.

Technology Category

Application Category

📝 Abstract
Hallucination issues continue to affect multimodal large language models (MLLMs), with existing research mainly addressing object-level or attribute-level hallucinations, neglecting the more complex relation hallucinations that require advanced reasoning. Current benchmarks for relation hallucinations lack detailed evaluation and effective mitigation, and their datasets often suffer from biases due to systematic annotation processes. To address these challenges, we introduce Reefknot, a comprehensive benchmark targeting relation hallucinations, comprising over 20,000 real-world samples. We provide a systematic definition of relation hallucinations, integrating perceptive and cognitive perspectives, and construct a relation-based corpus using the Visual Genome scene graph dataset. Our comparative evaluation reveals significant limitations in current MLLMs' ability to handle relation hallucinations. Additionally, we propose a novel confidence-based mitigation strategy, which reduces the hallucination rate by an average of 9.75% across three datasets, including Reefknot. Our work offers valuable insights for achieving trustworthy multimodal intelligence.
Problem

Research questions and friction points this paper is trying to address.

Evaluating relation hallucinations in multimodal language models
Addressing biases in current relation hallucination benchmarks
Mitigating relation hallucinations using confidence-based strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Reefknot benchmark for relation hallucinations
Uses Visual Genome scene graph dataset
Proposes confidence-based mitigation strategy
🔎 Similar Papers
No similar papers found.
K
Kening Zheng
Hong Kong University of Science and Technology (Guangzhou)
J
Junkai Chen
Hong Kong University of Science and Technology (Guangzhou)
Yibo Yan
Yibo Yan
East China Normal University
High-dimensional Statistics
X
Xin Zou
Hong Kong University of Science and Technology (Guangzhou)
Xuming Hu
Xuming Hu
Assistant Professor, HKUST(GZ) / HKUST
Natural Language ProcessingLarge Language Model