🤖 AI Summary
This study addresses computational social science by proposing a topic-agnostic framework for fine-grained identification of users’ participation levels in collective action on social media. Grounded in social movement mobilization theory, it conceptualizes participation as a four-tiered process: problem awareness → call to action → intention expression → actual engagement.
Method: We introduce the first theory-driven participation classification schema and develop an end-to-end detection pipeline integrating crowdsourced Reddit annotations with a hybrid model—combining a BERT-based classifier and a fine-tuned Llama3 model.
Contribution/Results: Our lightweight ensemble achieves a weighted F1-score of 0.71, matching the performance of larger models while demonstrating superior robustness and interpretability. It significantly outperforms baselines—including topic modeling, stance detection, and keyword matching—on community representation tasks. The framework provides a theoretically grounded, scalable, and explainable paradigm for large-scale social mobilization analysis.
📝 Abstract
Social media play a key role in mobilizing collective action, holding the potential for studying the pathways that lead individuals to actively engage in addressing global challenges. However, quantitative research in this area has been limited by the absence of granular and large-scale ground truth about the level of participation in collective action among individual social media users. To address this limitation, we present a novel suite of text classifiers designed to identify expressions of participation in collective action from social media posts, in a topic-agnostic fashion. Grounded in the theoretical framework of social movement mobilization, our classification captures participation and categorizes it into four levels: recognizing collective issues, engaging in calls-to-action, expressing intention of action, and reporting active involvement. We constructed a labeled training dataset of Reddit comments through crowdsourcing, which we used to train BERT classifiers and fine-tune Llama3 models. Our findings show that smaller language models can reliably detect expressions of participation (weighted F1=0.71), and rival larger models in capturing nuanced levels of participation. By applying our methodology to Reddit, we illustrate its effectiveness as a robust tool for characterizing online communities in innovative ways compared to topic modeling, stance detection, and keyword-based methods. Our framework contributes to Computational Social Science research by providing a new source of reliable annotations useful for investigating the social dynamics of collective action.