🤖 AI Summary
This work investigates the systematic overuse of masculine generics (MG) and implicit gender bias in French general instruction responses generated by large language models (LLMs). To address this, we construct the first manually annotated French human noun dataset, derived from a curated set of general instructions, and evaluate MG bias across six open- and closed-source LLMs using human annotation and statistical analysis. Results reveal that approximately 39.5% of all general instruction responses exhibit MG bias; among responses containing human nouns, the bias rate rises to 73.1%, with models rarely adopting gender-inclusive alternatives proactively. Our key contributions include: (1) a novel quantitative framework for measuring MG bias; (2) the first French MG benchmark dataset; and (3) empirical evidence that LLMs significantly reinforce traditional gender norms in generic contexts—thereby establishing a foundation for bias detection and debiasing methodologies in multilingual NLP.
📝 Abstract
Large language models (LLMs) have been shown to propagate and even amplify gender bias, in English and other languages, in specific or constrained contexts. However, no studies so far have focused on gender biases conveyed by LLMs' responses to generic instructions, especially with regard to masculine generics (MG). MG are a linguistic feature found in many gender-marked languages, denoting the use of the masculine gender as a"default"or supposedly neutral gender to refer to mixed group of men and women, or of a person whose gender is irrelevant or unknown. Numerous psycholinguistics studies have shown that MG are not neutral and induce gender bias. This work aims to analyze the use of MG by both proprietary and local LLMs in responses to generic instructions and evaluate their MG bias rate. We focus on French and create a human noun database from existing lexical resources. We filter existing French instruction datasets to retrieve generic instructions and analyze the responses of 6 different LLMs. Overall, we find that $approx$39.5% of LLMs' responses to generic instructions are MG-biased ($approx$73.1% across responses with human nouns). Our findings also reveal that LLMs are reluctant to using gender-fair language spontaneously.