๐ค AI Summary
This work addresses the performance degradation of multimodal fake news detection in real-world scenarios due to missing modalities. It is the first to reveal a connection between attention heads in multimodal large language models and modality robustness, identifying a subset of โmodality-critical headsโ capable of verifying information using a single modality. Building on this insight, the study proposes a head-level explicit modality assignment mechanism, a lower-bound attention constraint, and a single-modality knowledge preservation strategy. These innovations significantly enhance detection robustness under modality-missing conditions while maintaining competitive performance when all modalities are present.
๐ Abstract
Multimodal fake news detection (MFND) aims to verify news credibility by jointly exploiting textual and visual evidence. However, real-world news dissemination frequently suffers from missing modality due to deleted images, corrupted screenshots, and similar issues. Thus, robust detection in this scenario requires preserving strong verification ability for each modality, which is challenging in MFND due to insufficient learning of the low-contribution modality and scarce unimodal annotations. To address this issue, we propose Head-wise Modality Specialization within Multimodal Large Language Models (MLLMs) for robust MFND under missing modality. Specifically, we first systematically study attention heads in MLLMs and their relationship with performance under missing modality, showing that modality-critical heads serve as key carriers of unimodal verification ability through their modality specialization. Based on this observation, to better preserve verification ability for the low-contribution modality, we introduce a head-wise specialization mechanism that explicitly allocates these heads to different modalities and preserves their specialization through lower-bound attention constraints. Furthermore, to better exploit scarce unimodal annotations, we propose a Unimodal Knowledge Retention strategy that prevents these heads from drifting away from the unimodal knowledge learned from limited supervision. Experiments show that our method improves robustness under missing modality while preserving performance with full multimodal input.