Multimodal Fake News Video Explanation Generation

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

176K/year

🤖 AI Summary

To address the lack of interpretable and verifiable natural-language explanations in multimodal fake news video detection, this paper introduces the novel Fake News Video Explanation (FNVE) task and presents FakeNVE—the first multimodal fake news video dataset annotated with human-written explanatory rationales. Methodologically, we propose a multimodal Transformer-based cross-modal alignment encoder that fuses visual frames, audio, and subtitle text features, coupled with a BART autoregressive decoder to generate attributional English explanations. Experiments demonstrate that our approach significantly outperforms baselines across BLEU, ROUGE, and BERTScore metrics, as well as human evaluation (92.3% sufficiency, 94.1% fluency), achieving a favorable balance between explanation readability and factual consistency. This work establishes a new explainable paradigm for multimodal fake news detection.

Technology Category

Application Category

📝 Abstract

Multi-modal explanation involves the assessment of the veracity of a variety of different content, and relies on multiple information modalities to comprehensively consider the relevance and consistency between modalities. Most existing fake news video detection methods focus on improving accuracy while ignoring the importance of providing explanations. In this paper, we propose a novel problem - Fake News Video Explanation (FNVE) - Given a multimodal news containing both video and caption text, we aim to generate natural language explanations to reveal the truth of predictions. To this end, we develop FakeNVE, a new dataset of explanations for truthfully multimodal posts, where each explanation is a natural language (English) sentence describing the attribution of a news thread. We benchmark FakeNVE by using a multimodal transformer-based architecture. Subsequently, a BART-based autoregressive decoder is used as the generator. Empirical results show compelling results for various baselines (applicable to FNVE) across multiple evaluation metrics. We also perform human evaluation on explanation generation, achieving high scores for both adequacy and fluency.

Problem

Research questions and friction points this paper is trying to address.

Multimodal Explanation

Fake News Detection

Video-Text Consistency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Interpretation

Fake News Videos

BART Tool

🔎 Similar Papers

Official-NV: An LLM-Generated News Video Dataset for Multimodal Fake News Detection