🤖 AI Summary
Existing social media user modeling approaches rely on single-modal data and struggle to integrate heterogeneous, multi-source signals. To address this, we propose the first unified multi-view representation framework that jointly models temporal posting behavior, textual content, user profile attributes, and social network interactions. Methodologically, we innovatively integrate Transformer-based temporal encoding, cross-modal feature alignment, graph-based link prediction, and contrastive learning, coupled with a dual-objective collaborative training scheme to yield representations that are cross-modal, interpretable, and socially aware. Evaluated on three downstream tasks—fake account detection, sentiment polarization analysis, and extremist community participation prediction—our framework achieves substantial improvements over state-of-the-art methods: +12.6% in F1-score, 0.89 correlation with polarization metrics, and 0.93 AUC for extremist community participation prediction.
📝 Abstract
Social media user representation learning aims to capture user preferences, interests, and behaviors in low-dimensional vector representations. These representations are critical to a range of social problems, including predicting user behaviors and detecting inauthentic accounts. However, existing methods are either designed for commercial applications, or rely on specific features like text contents, activity patterns, or platform metadata, failing to holistically model user behavior across different modalities. To address these limitations, we propose SoMeR, a Social Media user Representation learning framework that incorporates temporal activities, text contents, profile information, and network interactions to learn comprehensive user portraits. SoMeR encodes user post streams as sequences of time-stamped textual features, uses transformers to embed this along with profile data, and jointly trains with link prediction and contrastive learning objectives to capture user similarity. We demonstrate SoMeR's versatility through three applications: 1) Identifying information operation driver accounts, 2) Measuring online polarization after major events, and 3) Predicting future user participation in Reddit hate communities. SoMeR provides new solutions to better understand user behavior in the socio-political domains, enabling more informed decisions and interventions.