Assessing Text Classification Methods for Cyberbullying Detection on Social Media Platforms

📅 2024-12-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address poor real-time performance, low data quality, high computational overhead, and insufficient model generalization in social media cyberbullying detection, this study develops a lightweight, efficient text-based detection system tailored for real-world deployment. We conduct the first systematic, multi-dimensional evaluation—comparing accuracy, F1-score, inference latency, memory footprint, and energy consumption—of five transformer-based models (BERT, RoBERTa, XLNet, DistilBERT, and GPT-2.0) on cyberbullying identification. Results show that supervised fine-tuned BERT achieves 95% accuracy and F1-score, with per-instance inference time of 0.053 seconds, memory usage of 35.28 MB, and energy consumption of 0.000263 kWh—demonstrating superior overall efficiency. These findings challenge the prevailing assumption that generative models inherently outperform discriminative ones in this task, and provide a reproducible, scalable technical pathway for deploying cyberbullying detection on resource-constrained social platforms.

Technology Category

Application Category

📝 Abstract
Cyberbullying significantly contributes to mental health issues in communities by negatively impacting the psychology of victims. It is a prevalent problem on social media platforms, necessitating effective, real-time detection and monitoring systems to identify harmful messages. However, current cyberbullying detection systems face challenges related to performance, dataset quality, time efficiency, and computational costs. This research aims to conduct a comparative study by adapting and evaluating existing text classification techniques within the cyberbullying detection domain. The study specifically evaluates the effectiveness and performance of these techniques in identifying cyberbullying instances on social media platforms. It focuses on leveraging and assessing large language models, including BERT, RoBERTa, XLNet, DistilBERT, and GPT-2.0, for their suitability in this domain. The results show that BERT strikes a balance between performance, time efficiency, and computational resources: Accuracy of 95%, Precision of 95%, Recall of 95%, F1 Score of 95%, Error Rate of 5%, Inference Time of 0.053 seconds, RAM Usage of 35.28 MB, CPU/GPU Usage of 0.4%, and Energy Consumption of 0.000263 kWh. The findings demonstrate that generative AI models, while powerful, do not consistently outperform fine-tuned models on the tested benchmarks. However, state-of-the-art performance can still be achieved through strategic adaptation and fine-tuning of existing models for specific datasets and tasks.
Problem

Research questions and friction points this paper is trying to address.

cyberbullying detection
text classification
efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Traditional Tokenization Optimization
BERT Model
Cyberbullying Detection
🔎 Similar Papers
No similar papers found.
A
Adamu Gaston Philipo
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
D
Doreen Sebastian Sarwatt
School of Computer and Communication Engineering, University of Science and Technology Beijing, Beijing, China
Jianguo Ding
Jianguo Ding
Blekinge Institute of Technology
cybersecurityAIblockchainmetaversecritical infrastructure protection
M
Mahmoud Daneshmand
Department of Business Intelligence and Analytics and the Department of Computer Science, Stevens Institute of Technology, Hoboken, NJ, USA
Huansheng Ning
Huansheng Ning
University of Science and Technology Beijing (北京科技大学)
Ubiquitous IoTCyberspace & CyberHealthCyberphilosophy & CyberismC-P-S-T AI