🤖 AI Summary
This study addresses the loss of explicit gender information in English-to-Hindi machine translation due to divergent grammatical structures—particularly ergativity and honorifics—which undermines cultural fidelity. For the first time, gender recoverability is introduced as a core metric for evaluating cultural faithfulness. The authors propose two inference-time, phenomenon-aware reranking strategies: a Source-Aware Reranker (SAR) and a Phenomenon-Aware Reranker (PAR), both leveraging large language models (GPT-4o-mini and Sarvam) to intervene in the translation process. Experimental results demonstrate that PAR substantially improves gender accuracy on targeted subsets from 11%–16% to 50%–55%, while human evaluations show a dramatic increase in gender retention from 10.3% to 81.3%, highlighting a critical trade-off between translational fluency and fidelity to source-gender semantics.
📝 Abstract
Generative translation systems are cultural technologies because they decide how socially meaningful cues are rendered within culturally specific grammatical systems. We study one concrete notion of successful cultural translation: when an English source explicitly encodes gender, an English-to-Hindi translation should preserve the recoverability of that cue unless the source itself is ambiguous. We evaluate this criterion on a 37,345-instance benchmark spanning twelve categories and show that five systems frequently erase gender through ergative and honorific constructions. We then introduce two mechanism-aware inference-time interventions. The first, the Source-Aware Reranker (SAR), prefers candidates that avoid gender-neutralizing syntax. The second, the Phenomenon-Aware Reranker (PAR), preserves gender through targeted lexical marking even when ergative syntax remains. Across GPT-4o-mini and Sarvam, PAR improves target-subset accuracy from 11.07% to 54.47% and from 15.99% to 49.66%, respectively. Human evaluation shows that PAR increases gender preservation from 10.3% to 81.3%, but reduces mean fluency from 4.36 to 3.37. These findings place the two interventions on a preservation and fluency frontier rather than supporting a single dominant solution, and show how culturally situated generation can require explicit tradeoffs among fidelity, fluency, and stylistic naturalness.