GS_DravidianLangTech@2025: Women Targeted Abusive Texts Detection on Social Media

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the detection of female-targeted abusive text—including hate speech, derogatory language, and threats—in Tamil and Malayalam social media, marking the first systematic, gender-sensitive abusive language identification effort for low-resource Dravidian languages. We propose a two-stage approach integrating logistic regression with fine-tuned multilingual BERT, trained and evaluated cross-lingually on the DravidianLangTech@2025 annotated dataset. Experimental results show that the BERT-based model achieves macro-F1 scores of 0.729 on the Tamil test set and 0.628 on the Malayalam test set—substantially outperforming baseline methods. This work fills a critical research gap in content safety for South Indian languages, specifically in detecting gendered abuse. It provides a reproducible methodological framework and benchmark results for gender-inclusive NLP in low-resource settings.

Technology Category

Application Category

📝 Abstract
The increasing misuse of social media has become a concern; however, technological solutions are being developed to moderate its content effectively. This paper focuses on detecting abusive texts targeting women on social media platforms. Abusive speech refers to communication intended to harm or incite hatred against vulnerable individuals or groups. Specifically, this study aims to identify abusive language directed toward women. To achieve this, we utilized logistic regression and BERT as base models to train datasets sourced from DravidianLangTech@2025 for Tamil and Malayalam languages. The models were evaluated on test datasets, resulting in a 0.729 macro F1 score for BERT and 0.6279 for logistic regression in Tamil and Malayalam, respectively.
Problem

Research questions and friction points this paper is trying to address.

Detect abusive texts targeting women on social media
Identify harmful language in Tamil and Malayalam
Evaluate models for abusive speech detection performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Used logistic regression for abusive text detection
Employed BERT model for enhanced accuracy
Trained on DravidianLangTech@2025 Tamil and Malayalam datasets
🔎 Similar Papers
No similar papers found.
G
G. Bade
Centro de Investigacion en Computación(CIC), Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal, Ciudad de México, 07320, México.
Z
Z. Ahani
Centro de Investigacion en Computación(CIC), Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal, Ciudad de México, 07320, México.
Olga Kolesnikova
Olga Kolesnikova
Centro de Investigación en Computación (CIC) del Instituto Politécnico Nacional, Mexico
Artificial IntelligenceNatural Language ProcessingLinguistics
J
J. Oropeza
Centro de Investigacion en Computación(CIC), Instituto Politécnico Nacional(IPN), Miguel Othon de Mendizabal, Ciudad de México, 07320, México.
Grigori Sidorov
Grigori Sidorov
Professor of Computational Linguistics, Instituto Politécnico Nacional (IPN), Mexico
Computational LinguisticsNatural Language ProcessingArtificial IntelligenceMachine Learning