AfriStereo: A Culturally Grounded Dataset for Evaluating Stereotypical Bias in Large Language Models

📅 2025-11-26

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing AI bias evaluation benchmarks are predominantly Western-centric, overlooking Africa’s sociocultural diversity and inadvertently reinforcing harmful stereotypes. Method: We introduce AfriStereo—the first open-source, Africa-localized stereotype dataset—covering gender, ethnicity, and religion. It comprises 5,000+ validated stereotype–counterstereotype sentence pairs, co-developed with African communities. We propose a human-in-the-loop annotation framework integrated with semantic clustering for validation, and design the Bias Preference Ratio (BPR) as a quantitative metric to assess stereotypical preferences in language models. Contribution/Results: Systematic evaluation of 11 large language models reveals significant bias in 9 models (BPR = 0.63–0.78, *p* ≤ 0.05); domain-specialized models exhibit markedly lower bias, supporting the efficacy of task-specific fine-tuning for bias mitigation. This work establishes a contextualized, community-grounded paradigm for bias assessment in underrepresented regions.

Technology Category

Application Category

📝 Abstract

Existing AI bias evaluation benchmarks largely reflect Western perspectives, leaving African contexts underrepresented and enabling harmful stereotypes in applications across various domains. To address this gap, we introduce AfriStereo, the first open-source African stereotype dataset and evaluation framework grounded in local socio-cultural contexts. Through community engaged efforts across Senegal, Kenya, and Nigeria, we collected 1,163 stereotypes spanning gender, ethnicity, religion, age, and profession. Using few-shot prompting with human-in-the-loop validation, we augmented the dataset to over 5,000 stereotype-antistereotype pairs. Entries were validated through semantic clustering and manual annotation by culturally informed reviewers. Preliminary evaluation of language models reveals that nine of eleven models exhibit statistically significant bias, with Bias Preference Ratios (BPR) ranging from 0.63 to 0.78 (p <= 0.05), indicating systematic preferences for stereotypes over antistereotypes, particularly across age, profession, and gender dimensions. Domain-specific models appeared to show weaker bias in our setup, suggesting task-specific training may mitigate some associations. Looking ahead, AfriStereo opens pathways for future research on culturally grounded bias evaluation and mitigation, offering key methodologies for the AI community on building more equitable, context-aware, and globally inclusive NLP technologies.

Problem

Research questions and friction points this paper is trying to address.

Evaluating stereotypical bias in LLMs from African cultural perspectives

Addressing underrepresentation of African contexts in AI bias benchmarks

Measuring systematic preference for stereotypes across demographic dimensions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Created first African stereotype dataset via community engagement

Used few-shot prompting with human validation for data augmentation

Applied semantic clustering and manual annotation for validation

🔎 Similar Papers

No similar papers found.

Authors to Follow