🤖 AI Summary
This study investigates how explicit gender marking in occupational titles within grammatical-gender languages induces systemic gender bias in automated ranking systems. We propose a quantitative evaluation framework grounded in rank-similarity metrics—particularly Rank-Biased Overlap (RBO)—and introduce the first publicly available, multilingual occupational-title matching benchmark covering four grammatical-gender languages (Spanish, French, German, Russian), annotated with fine-grained gender and relevance labels. Using multilingual pretrained models on cross-lingual title-matching tasks, we demonstrate that all mainstream off-the-shelf models exhibit consistent, statistically significant gender bias in ranking outputs. These findings reveal the profound impact of grammatical gender on ranking fairness. Our work establishes a new methodological paradigm and benchmark for assessing gender bias in occupational ranking, enabling rigorous, reproducible evaluation across languages and models.
📝 Abstract
This work sets the ground for studying how explicit grammatical gender assignment in job titles can affect the results of automatic job ranking systems. We propose the usage of metrics for ranking comparison controlling for gender to evaluate gender bias in job title ranking systems, in particular RBO (Rank-Biased Overlap). We generate and share test sets for a job title matching task in four grammatical gender languages, including occupations in masculine and feminine form and annotated by gender and matching relevance. We use the new test sets and the proposed methodology to evaluate the gender bias of several out-of-the-box multilingual models to set as baselines, showing that all of them exhibit varying degrees of gender bias.