🤖 AI Summary
Gender-biased masculine generic terms in French reinforce gender stereotypes, yet existing automated gender-neutralization techniques are limited to English. This paper introduces the first French-specific gender-neutral rewriting system, proposing a novel “collective-noun-driven” paradigm: it replaces masculine generics with semantically neutral but grammatically fixed collective nouns (e.g., *le personnel*) to achieve bias-mitigated rewriting without semantic loss. Our approach integrates a three-tier architecture: (1) a high-accuracy, interpretable rule-based system (RBS); (2) a fine-tuned French language model trained on RBS-distilled data; and (3) a Claude 3 Opus model instruction-finetuned with domain-specific lexical constraints. Experiments show that both fine-tuned and instruction-tuned models closely match RBS performance, demonstrating the viability of large language models for French gender-neutralization. This work fills a critical gap in French automatic gender-neutralization research and establishes a scalable framework for inclusive NLP in low-resource languages.
📝 Abstract
A significant portion of the textual data used in the field of Natural Language Processing (NLP) exhibits gender biases, particularly due to the use of masculine generics (masculine words that are supposed to refer to mixed groups of men and women), which can perpetuate and amplify stereotypes. Gender rewriting, an NLP task that involves automatically detecting and replacing gendered forms with neutral or opposite forms (e.g., from masculine to feminine), can be employed to mitigate these biases. While such systems have been developed in a number of languages (English, Arabic, Portuguese, German, French), automatic use of gender neutralization techniques (as opposed to inclusive or gender-switching techniques) has only been studied for English. This paper presents GeNRe, the very first French gender-neutral rewriting system using collective nouns, which are gender-fixed in French. We introduce a rule-based system (RBS) tailored for the French language alongside two fine-tuned language models trained on data generated by our RBS. We also explore the use of instruct-based models to enhance the performance of our other systems and find that Claude 3 Opus combined with our dictionary achieves results close to our RBS. Through this contribution, we hope to promote the advancement of gender bias mitigation techniques in NLP for French.