Dual-Metric Evaluation of Social Bias in Large Language Models: Evidence from an Underrepresented Nepali Cultural Context

📅 2026-03-08

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the underexplored issue of implicit social biases in large language models (LLMs) within underrepresented cultural contexts such as Nepal, where systematic evaluation methods are lacking. The authors present the first Nepali cultural dataset compatible with the Croissant framework, comprising over 2,400 pairs of stereotypical and counter-stereotypical sentences, and introduce a dual-metric bias assessment (DMBA) framework that separately quantifies explicit agreement with biased statements and implicit stereotypical tendencies in model generations. Experiments reveal significant biases across seven mainstream LLMs, with implicit bias rates ranging from 0.740 to 0.755—particularly pronounced in ethnic and sociocultural domains. Notably, explicit and implicit biases exhibit only weak correlation, and while implicit bias follows a nonlinear U-shaped trend with temperature variation, it remains relatively stable across different top-p values.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) increasingly influence global digital ecosystems, yet their potential to perpetuate social and cultural biases remains poorly understood in underrepresented contexts. This study presents a systematic analysis of representational biases in seven state-of-the-art LLMs: GPT-4o-mini, Claude-3-Sonnet, Claude-4-Sonnet, Gemini-2.0-Flash, Gemini-2.0-Lite, Llama-3-70B, and Mistral-Nemo in the Nepali cultural context. Using Croissant-compliant dataset of 2400+ stereotypical and anti-stereotypical sentence pairs on gender roles across social domains, we implement an evaluation framework, Dual-Metric Bias Assessment (DMBA), combining two metrics: (1) agreement with biased statements and (2) stereotypical completion tendencies. Results show models exhibit measurable explicit agreement bias, with mean bias agreement ranging from 0.36 to 0.43 across decoding configurations, and an implicit completion bias rate of 0.740-0.755. Importantly, implicit completion bias follows a non-linear, U-shaped relationship with temperature, peaking at moderate stochasticity (T=0.3) and declining slightly at higher temperatures. Correlation analysis under different decoding settings revealed that explicit agreement strongly aligns with stereotypical sentence agreement but is a weak and often negative predictor of implicit completion bias, indicating generative bias is poorly captured by agreement metrics. Sensitivity analysis shows increasing top-p amplifies explicit bias, while implicit generative bias remains largely stable. Domain-level analysis shows implicit bias is strongest for race and sociocultural stereotypes, while explicit agreement bias is similar across gender and sociocultural categories, with race showing the lowest explicit agreement. These findings highlight the need for culturally grounded datasets and debiasing strategies for LLMs in underrepresented societies.

Problem

Research questions and friction points this paper is trying to address.

social bias

large language models

underrepresented contexts

cultural bias

Nepali cultural context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-Metric Bias Assessment

implicit generative bias

underrepresented cultural context

Croissant-compliant dataset

stereotypical completion

🔎 Similar Papers

From Prejudice to Parity: A New Approach to Debiasing Large Language Model Word Embeddings

2024-02-18arXiv.orgCitations: 0

Authors to Follow