π€ AI Summary
Existing social bot detection methods suffer from limited generalization under modality gaps, incomplete inputs, or out-of-distribution samples, struggling to handle cross-domain scenarios and novel camouflage strategies. To address this, this work proposes a large language modelβbased multi-granularity semantic summarization framework that unifies heterogeneous signals into textual representations. The approach integrates task-oriented instruction tuning, domain-adversarial learning, and cross-domain contrastive learning to jointly optimize domain-invariant yet discriminative feature representations. By uniquely combining multi-granularity summarization with domain-invariant learning, the method significantly improves detection performance across multiple cross-dataset and cross-temporal evaluations, effectively enhancing distribution alignment, intra-class compactness, and inter-class separability.
π Abstract
Social bots increasingly infiltrate online platforms through sophisticated disguises, threatening healthy information ecosystems. Existing detection methods often rely on modality specific cues or local contextual features, making them brittle when modalities are missing or inputs are incomplete. Moreover, most approaches assume similar train test distributions, which limits their robustness to out of distribution (OOD) samples and emerging bot types. To address these challenges, we propose Multi Granularity Summarization and Domain Invariant Learning (MGDIL), a unified framework for robust social bot detection under domain shift. MGDIL first transforms heterogeneous signals into unified textual representations through LLM based multi granularity summarization. Building on these representations, we design a collaborative optimization framework that integrates task oriented LLM instruction tuning with domain invariant representation learning. Specifically, task oriented instruction tuning enhances the LLMs ability to capture subtle semantic cues and implicit camouflage patterns, while domain adversarial learning and cross domain contrastive learning are jointly employed to mitigate distribution shifts across datasets and time periods. Through this joint optimization, MGDIL learns stable and discriminative domain invariant features, improving cross domain social bot detection through better distribution alignment, stronger intra class compactness, and clearer inter class separation.