Dataset Creation and Baseline Models for Sexism Detection in Hausa

📅 2025-10-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the scarcity of gender bias detection research for low-resource languages like Hausa, this study constructs the first high-quality, culturally grounded Hausa annotated dataset for gender discrimination detection, leveraging community collaboration, qualitative coding, and data augmentation. Methodologically, it introduces a two-stage user study—including in-depth interviews with native speakers and iterative annotation feedback—to systematically identify and model culture-specific linguistic phenomena (e.g., clarification requests, idiomatic misinterpretations). It further benchmarks traditional machine learning and multilingual pretrained models under few-shot settings. Results show that while multilingual models exhibit baseline discrimination detection capability, their lack of cultural contextualization leads to substantial false positives. This work fills a critical gap in computational gender bias research for Hausa and proposes a “culture-embedded data construction” paradigm—a reusable methodological framework for bias detection in low-resource languages.

Technology Category

Application Category

📝 Abstract
Sexism reinforces gender inequality and social exclusion by perpetuating stereotypes, bias, and discriminatory norms. Noting how online platforms enable various forms of sexism to thrive, there is a growing need for effective sexism detection and mitigation strategies. While computational approaches to sexism detection are widespread in high-resource languages, progress remains limited in low-resource languages where limited linguistic resources and cultural differences affect how sexism is expressed and perceived. This study introduces the first Hausa sexism detection dataset, developed through community engagement, qualitative coding, and data augmentation. For cultural nuances and linguistic representation, we conducted a two-stage user study (n=66) involving native speakers to explore how sexism is defined and articulated in everyday discourse. We further experiment with both traditional machine learning classifiers and pre-trained multilingual language models and evaluating the effectiveness few-shot learning in detecting sexism in Hausa. Our findings highlight challenges in capturing cultural nuance, particularly with clarification-seeking and idiomatic expressions, and reveal a tendency for many false positives in such cases.
Problem

Research questions and friction points this paper is trying to address.

Creating the first annotated dataset for detecting sexism in Hausa language
Addressing cultural nuances in sexism expression through community engagement
Evaluating machine learning models for low-resource language sexism detection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Developed first Hausa sexism dataset through community engagement
Conducted two-stage native speaker study for cultural nuances
Evaluated few-shot learning with multilingual models for detection
🔎 Similar Papers
No similar papers found.
F
Fatima Adam Muhammad
Federal University Dutse
S
Shamsuddeen Muhammad Hassan
Imperial College London
Isa Inuwa-Dutse
Isa Inuwa-Dutse
Senior Lecturer in Computer Science, University of Huddersfield, UK
Machine LearningAI SafetyExplainable AIText MiningSocial Network Analysis