Domain-based user embedding for competing events on social media

📅 2023-08-28
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Modeling users during competitive social media events—such as political movements or public health crises—remains challenging due to the need for scalable, interpretable, and computationally efficient representations. Method: This paper proposes a lightweight user embedding method grounded in a URL domain co-occurrence network, where implicit behavioral associations across users—captured via shared domains—are leveraged as the primary representational signal. It constructs a domain co-occurrence graph and derives low-dimensional user embeddings directly from its adjacency matrix, avoiding reliance on complex graph neural architectures or resource-intensive language models. Contribution/Results: Evaluated on a multi-topic COVID-19 Twitter dataset using logistic regression and SVM classifiers, the method achieves 5.2–9.8% higher accuracy than baselines based on retweet networks and pretrained language models, while reducing training time by 67%. It thus unifies high discriminative power with low computational overhead.
📝 Abstract
Online social networks offer vast opportunities for computational social science, but effective user embedding is crucial for downstream tasks. Traditionally, researchers have used pre-defined network-based user features, such as degree, and centrality measures, and/or content-based features, such as posts and reposts. However, these measures may not capture the complex characteristics of social media users. In this study, we propose a user embedding method based on the URL domain co-occurrence network, which is simple but effective for representing social media users in competing events. We assessed the performance of this method in binary classification tasks using benchmark datasets that included Twitter users related to COVID-19 infodemic topics (QAnon, Biden, Ivermectin). Our results revealed that user embeddings generated directly from the retweet network, and those based on language, performed below expectations. In contrast, our domain-based embeddings outperformed these methods while reducing computation time. These findings suggest that the domain-based user embedding can serve as an effective tool to characterize social media users participating in competing events, such as political campaigns and public health crises.
Problem

Research questions and friction points this paper is trying to address.

Developed domain-based user embedding for social media analysis
Addressed limitations of network-based features in user representation
Improved classification performance for competing events on Twitter
Innovation

Methods, ideas, or system contributions that make the work stand out.

Embedding users via URL domain co-occurrence network
Outperforming retweet and language-based embedding methods
Reducing computation time while maintaining effectiveness
🔎 Similar Papers
No similar papers found.
Wentao Xu
Wentao Xu
Associate Professor, University of Science and Technology of China
Human-AI InteractionHuman behaviorAI AgentsLLMSocial computing
K
Kazutoshi Sasahara
School of Environment and Society, Tokyo Institute of Technology, Japan