Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

📅 2024-08-18

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

🤖 AI Summary

Relationship hallucination—erroneous generation of complex semantic relationships among objects—has been long overlooked in multimodal large language models (MLLMs); existing evaluation benchmarks suffer from ambiguous definitions, severe data bias, and a lack of dedicated mitigation strategies. Method: We propose Reefknot, the first comprehensive benchmark for relationship hallucination, comprising 21K real-world samples. It formally defines the problem through dual perceptual and cognitive perspectives and constructs a relation-centric corpus derived from Visual Genome. Our method introduces relational triple modeling, scene graph analysis, and confidence calibration—the first dedicated mitigation framework for this issue. Results: Experiments on Reefknot and two external benchmarks demonstrate an average 9.75% reduction in relationship hallucination rates, exposing systematic relational reasoning deficiencies across mainstream MLLMs. Reefknot establishes a reproducible, fine-grained evaluation paradigm for trustworthy multimodal intelligence.

Technology Category

Application Category

📝 Abstract

Hallucination issues continue to affect multimodal large language models (MLLMs), with existing research mainly addressing object-level or attribute-level hallucinations, neglecting the more complex relation hallucinations that require advanced reasoning. Current benchmarks for relation hallucinations lack detailed evaluation and effective mitigation, and their datasets often suffer from biases due to systematic annotation processes. To address these challenges, we introduce Reefknot, a comprehensive benchmark targeting relation hallucinations, comprising over 20,000 real-world samples. We provide a systematic definition of relation hallucinations, integrating perceptive and cognitive perspectives, and construct a relation-based corpus using the Visual Genome scene graph dataset. Our comparative evaluation reveals significant limitations in current MLLMs' ability to handle relation hallucinations. Additionally, we propose a novel confidence-based mitigation strategy, which reduces the hallucination rate by an average of 9.75% across three datasets, including Reefknot. Our work offers valuable insights for achieving trustworthy multimodal intelligence.

Problem

Research questions and friction points this paper is trying to address.

Evaluating relation hallucinations in multimodal language models

Addressing biases in current relation hallucination benchmarks

Mitigating relation hallucinations using confidence-based strategies

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Reefknot benchmark for relation hallucinations

Uses Visual Genome scene graph dataset

Proposes confidence-based mitigation strategy

🔎 Similar Papers

No similar papers found.

Authors to Follow