🤖 AI Summary
Existing fake news detection models suffer from noisy, low-veracity training data—particularly in the video modality. To address this, we introduce Official-NV, the first high-quality multimodal fake news detection dataset curated exclusively from official news sources; it is expanded via LLM-generated candidates refined by human verification, substantially improving data credibility and annotation accuracy. Methodologically, we propose OFNVD, a GLU-enhanced cross-modal Transformer that enables fine-grained feature extraction and dynamic, adaptive inter-modal aggregation. Extensive experiments demonstrate that OFNVD achieves significant improvements over state-of-the-art baselines across multiple benchmarks. This work contributes (1) an authoritative, reproducible multimodal benchmark dataset grounded in verified information sources, and (2) a new strong baseline for trustworthy news analysis—thereby establishing foundational resources—both data and model—for robust multimodal misinformation detection.
📝 Abstract
News media, especially video news media, have penetrated into every aspect of daily life, which also brings the risk of fake news. Therefore, multimodal fake news detection has recently garnered increased attention. However, the existing datasets are comprised of user-uploaded videos and contain an excess amounts of superfluous data, which introduces noise into the model training process. To address this issue, we construct a dataset named Official-NV, comprising officially published news videos. The crawl officially published videos are augmented through the use of LLMs-based generation and manual verification, thereby expanding the dataset. We also propose a new baseline model called OFNVD, which captures key information from multimodal features through a GLU attention mechanism and performs feature enhancement and modal aggregation via a cross-modal Transformer. Benchmarking the dataset and baselines demonstrates the effectiveness of our model in multimodal news detection.