Translate With Care: Addressing Gender Bias, Neutrality, and Reasoning in Large Language Model Translations

📅 2025-05-31

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This work addresses gender bias and logical inconsistency in machine translation from natural-gender languages (e.g., English) to gender-neutral languages (e.g., Persian, Indonesian, Finnish). We introduce TWC—the first challenging benchmark dataset (3,950 instances) specifically designed for this scenario. We systematically reveal, for the first time, a pervasive male pronoun bias in mainstream LLMs’ translations into gender-neutral languages, particularly in occupational and leadership contexts. To mitigate this, we propose a fine-tuning paradigm for mBART-50 that jointly incorporates gender-neutrality constraints and logical consistency supervision. Experiments demonstrate that our method significantly reduces male pronoun preference to near parity (≈1:1), outperforming GPT-4, NLLB-200, and Google Translate across all TWC metrics. Moreover, the approach maintains strong cross-lingual generalization and ensures full open-source reproducibility.

Technology Category

Application Category

📝 Abstract

Addressing gender bias and maintaining logical coherence in machine translation remains challenging, particularly when translating between natural gender languages, like English, and genderless languages, such as Persian, Indonesian, and Finnish. We introduce the Translate-with-Care (TWC) dataset, comprising 3,950 challenging scenarios across six low- to mid-resource languages, to assess translation systems' performance. Our analysis of diverse technologies, including GPT-4, mBART-50, NLLB-200, and Google Translate, reveals a universal struggle in translating genderless content, resulting in gender stereotyping and reasoning errors. All models preferred masculine pronouns when gender stereotypes could influence choices. Google Translate and GPT-4 showed particularly strong bias, favoring male pronouns 4-6 times more than feminine ones in leadership and professional success contexts. Fine-tuning mBART-50 on TWC substantially resolved these biases and errors, led to strong generalization, and surpassed proprietary LLMs while remaining open-source. This work emphasizes the need for targeted approaches to gender and semantic coherence in machine translation, particularly for genderless languages, contributing to more equitable and accurate translation systems.

Problem

Research questions and friction points this paper is trying to address.

Addressing gender bias in translations between gendered and genderless languages

Evaluating translation systems' performance on gender neutrality and logical coherence

Reducing gender stereotyping and reasoning errors in machine translation models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introducing Translate-with-Care dataset for bias assessment

Fine-tuning mBART-50 to reduce gender bias effectively

Open-source solution surpassing proprietary LLMs in performance

🔎 Similar Papers

No similar papers found.