🤖 AI Summary
To address security risks posed by rapid misinformation propagation in mobile and wireless networks, this paper proposes a training-free, dynamically retrievable multimodal fact verification system. The method integrates pretrained vision-language models with large language models, enabling multi-scale cross-modal verification of image-text pairs through dynamic cross-modal retrieval—thereby circumventing vulnerabilities of conventional supervised models to adversarial attacks and data poisoning. Its lightweight architecture facilitates deployment on resource-constrained edge devices. Evaluated on two mainstream fact-checking benchmarks, the system achieves state-of-the-art (SOTA) performance and demonstrates significantly enhanced robustness over baseline approaches. Experimental results validate its effectiveness and security advantages in bandwidth- and compute-limited wireless environments.
📝 Abstract
The rapid spread of misinformation in mobile and wireless networks presents critical security challenges. This study introduces a training-free, retrieval-based multimodal fact verification system that leverages pretrained vision-language models and large language models for credibility assessment. By dynamically retrieving and cross-referencing trusted data sources, our approach mitigates vulnerabilities of traditional training-based models, such as adversarial attacks and data poisoning. Additionally, its lightweight design enables seamless edge device integration without extensive on-device processing. Experiments on two fact-checking benchmarks achieve SOTA results, confirming its effectiveness in misinformation detection and its robustness against various attack vectors, highlighting its potential to enhance security in mobile and wireless communication environments.