Online Social Support Detection in Spanish Social Media Texts

📅 2025-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of research on online social support identification in Spanish-language social media. We introduce the first fine-grained annotated dataset comprising 3,189 YouTube comments and formally define “social support detection” as a novel computational task. To mitigate class imbalance, we propose a GPT-4o–assisted data balancing method and empirically demonstrate its critical impact on recognizing six fine-grained support subtypes (e.g., individual/group, national/minority). Through systematic evaluation across traditional machine learning, LSTM, BERT, and GPT-4o zero-shot inference, we show that balanced data consistently improves performance across all models; notably, GPT-4o achieves state-of-the-art accuracy on the primary binary classification task (support vs. non-support). This study fills a key gap in positive content analysis for Spanish, providing both a foundational dataset and methodological framework to advance healthy online community development.

Technology Category

Application Category

📝 Abstract
The advent of social media has transformed communication, enabling individuals to share their experiences, seek support, and participate in diverse discussions. While extensive research has focused on identifying harmful content like hate speech, the recognition and promotion of positive and supportive interactions remain largely unexplored. This study proposes an innovative approach to detecting online social support in Spanish-language social media texts. We introduce the first annotated dataset specifically created for this task, comprising 3,189 YouTube comments classified as supportive or non-supportive. To address data imbalance, we employed GPT-4o to generate paraphrased comments and create a balanced dataset. We then evaluated social support classification using traditional machine learning models, deep learning architectures, and transformer-based models, including GPT-4o, but only on the unbalanced dataset. Subsequently, we utilized a transformer model to compare the performance between the balanced and unbalanced datasets. Our findings indicate that the balanced dataset yielded improved results for Task 2 (Individual and Group) and Task 3 (Nation, Other, LGBTQ, Black Community, Women, Religion), whereas GPT-4o performed best for Task 1 (Social Support and Non-Support). This study highlights the significance of fostering a supportive online environment and lays the groundwork for future research in automated social support detection.
Problem

Research questions and friction points this paper is trying to address.

Detecting online social support in Spanish texts
Creating a balanced dataset using GPT-4o
Evaluating model performance on balanced datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

GPT-4o generated paraphrased comments
Transformer model evaluated dataset performance
First annotated dataset for Spanish social support
🔎 Similar Papers
No similar papers found.