🤖 AI Summary
To address the proliferation of deceptive and contextually manipulated multimedia content across multilingual social media during crisis events, this paper proposes a scalable multimodal fact-checking system. Methodologically, it introduces the first unified verification framework integrating visual forensics, multimodal large language models, and joint semantic–temporal–geographic alignment to enable cross-modal consistency analysis and contextual accuracy assessment. The system supports multilingual inputs and automatically generates interpretable verification reports, balancing public accessibility with expert utility. Evaluated on the ACM Multimedia 2025 large-scale multilingual benchmark, it achieves significant improvements in detecting false and misused media—particularly excelling in semantic manipulation and spatiotemporal mismatch scenarios. This work delivers the first open-source fact-checking tool that simultaneously ensures robustness, interpretability, and multilingual adaptability, providing news organizations and fact-checking entities with a practical, deployable solution for real-world crisis response.
📝 Abstract
The proliferation of multimedia content on social media platforms has dramatically transformed how information is consumed and disseminated. While this shift enables real-time coverage of global events, it also facilitates the rapid spread of misinformation and disinformation, especially during crises such as wars, natural disasters, or elections. The rise of synthetic media and the reuse of authentic content in misleading contexts have intensified the need for robust multimedia verification tools. In this paper, we present a comprehensive system developed for the ACM Multimedia 2025 Grand Challenge on Multimedia Verification. Our system assesses the authenticity and contextual accuracy of multimedia content in multilingual settings and generates both expert-oriented verification reports and accessible summaries for the general public. We introduce a unified verification pipeline that integrates visual forensics, textual analysis, and multimodal reasoning, and propose a hybrid approach to detect out-of-context (OOC) media through semantic similarity, temporal alignment, and geolocation cues. Extensive evaluations on the Grand Challenge benchmark demonstrate the system's effectiveness across diverse real-world scenarios. Our contributions advance the state of the art in multimedia verification and offer practical tools for journalists, fact-checkers, and researchers confronting information integrity challenges in the digital age.