A New Dataset and Benchmark for Grounding Multimodal Misinformation

📅 2025-09-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the lack of interpretability and multimodal misinformative segment localization in online false information video detection, this paper introduces the GroundMM task: joint localization of misleading segments across textual, audio, and visual modalities. To support this task, we present GroundLie360—the first real-world, multimodal misinformation dataset featuring fine-grained spatiotemporal annotations, misinformation type categorization, and a fact-checking evidence–based verification mechanism. We further propose FakeMark, a question-answering–driven vision-language model baseline that integrates unimodal and cross-modal cues to achieve interpretable detection and precise localization. Experiments demonstrate the task’s substantial difficulty. Together, GroundLie360 and FakeMark constitute the first benchmark framework dedicated to explainable, multimodal false information localization, advancing evaluation paradigms and methodological development in this emerging field.

Technology Category

Application Category

📝 Abstract

The proliferation of online misinformation videos poses serious societal risks. Current datasets and detection methods primarily target binary classification or single-modality localization based on post-processed data, lacking the interpretability needed to counter persuasive misinformation. In this paper, we introduce the task of Grounding Multimodal Misinformation (GroundMM), which verifies multimodal content and localizes misleading segments across modalities. We present the first real-world dataset for this task, GroundLie360, featuring a taxonomy of misinformation types, fine-grained annotations across text, speech, and visuals, and validation with Snopes evidence and annotator reasoning. We also propose a VLM-based, QA-driven baseline, FakeMark, using single- and cross-modal cues for effective detection and grounding. Our experiments highlight the challenges of this task and lay a foundation for explainable multimodal misinformation detection.

Problem

Research questions and friction points this paper is trying to address.

Detecting and localizing misleading segments in multimodal misinformation videos

Addressing lack of interpretability in current misinformation detection methods

Verifying multimodal content across text, speech, and visual modalities

Innovation

Methods, ideas, or system contributions that make the work stand out.

VLM-based QA-driven baseline FakeMark

Cross-modal cues for detection grounding

Real-world dataset GroundLie360 annotations

🔎 Similar Papers

MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs

2024-06-13arXiv.orgCitations: 7

Authors to Follow