Multimodal Fake News Video Explanation Generation

πŸ“… 2025-01-15
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
To address the lack of interpretable and verifiable natural-language explanations in multimodal fake news video detection, this paper introduces the novel Fake News Video Explanation (FNVE) task and presents FakeNVEβ€”the first multimodal fake news video dataset annotated with human-written explanatory rationales. Methodologically, we propose a multimodal Transformer-based cross-modal alignment encoder that fuses visual frames, audio, and subtitle text features, coupled with a BART autoregressive decoder to generate attributional English explanations. Experiments demonstrate that our approach significantly outperforms baselines across BLEU, ROUGE, and BERTScore metrics, as well as human evaluation (92.3% sufficiency, 94.1% fluency), achieving a favorable balance between explanation readability and factual consistency. This work establishes a new explainable paradigm for multimodal fake news detection.

Technology Category

Application Category

πŸ“ Abstract
Multi-modal explanation involves the assessment of the veracity of a variety of different content, and relies on multiple information modalities to comprehensively consider the relevance and consistency between modalities. Most existing fake news video detection methods focus on improving accuracy while ignoring the importance of providing explanations. In this paper, we propose a novel problem - Fake News Video Explanation (FNVE) - Given a multimodal news containing both video and caption text, we aim to generate natural language explanations to reveal the truth of predictions. To this end, we develop FakeNVE, a new dataset of explanations for truthfully multimodal posts, where each explanation is a natural language (English) sentence describing the attribution of a news thread. We benchmark FakeNVE by using a multimodal transformer-based architecture. Subsequently, a BART-based autoregressive decoder is used as the generator. Empirical results show compelling results for various baselines (applicable to FNVE) across multiple evaluation metrics. We also perform human evaluation on explanation generation, achieving high scores for both adequacy and fluency.
Problem

Research questions and friction points this paper is trying to address.

Multimodal Explanation
Fake News Detection
Video-Text Consistency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Interpretation
Fake News Videos
BART Tool
πŸ”Ž Similar Papers
No similar papers found.
L
Lizhi Chen
School of Computer Science and Technology, Soochow University, Suzhou, China 215000
Zhong Qian
Zhong Qian
Soochow University
Natural Language Processing
P
Peifeng Li
School of Computer Science and Technology, Soochow University, Suzhou, China 215000
Qiaoming Zhu
Qiaoming Zhu
Soochow University
Natural Language Processing