๐ค AI Summary
To address the covert misuse of highly realistic, generative-AI-driven social bots, this paper proposes a robust multimodal detection framework. Methodologically, it jointly models textual, visual, and user-statistical features, introducing the Cross-Modal Residual Cross-Attention (CMRCA) mechanismโthe first to enable fine-grained, calibratable feature alignment and complementary enhancement across heterogeneous modalities. The framework integrates dedicated multimodal encoders with a graph neural network to explicitly capture user relational structures. Evaluated on the TwiBot-22 benchmark, it achieves an F1-score of 92.7%, surpassing the state-of-the-art by 3.2 percentage points and demonstrating significantly improved detection capability against novel, evasive bot variants. The core contributions are the novel design of the CMRCA mechanism and its first successful application to social bot detection, establishing a new paradigm for robust, multimodal adversarial bot identification.
๐ Abstract
Although social bots can be engineered for constructive applications, their potential for misuse in manipulative schemes and malware distribution cannot be overlooked. This dichotomy underscores the critical need to detect social bots on social media platforms. Advances in artificial intelligence have improved the abilities of social bots, allowing them to generate content that is almost indistinguishable from human-created content. These advancements require the development of more advanced detection techniques to accurately identify these automated entities. Given the heterogeneous information landscape on social media, spanning images, texts, and user statistical features, we propose MSM-BD, a Multimodal Social Media Bot Detection approach using heterogeneous information. MSM-BD incorporates specialized encoders for heterogeneous information and introduces a cross-modal fusion technology, Cross-Modal Residual Cross-Attention (CMRCA), to enhance detection accuracy. We validate the effectiveness of our model through extensive experiments using the TwiBot-22 dataset.