Debiasing Multilingual LLMs in Cross-lingual Latent Space

📅 2025-08-25

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Existing multilingual large language model (LLM) debiasing methods—e.g., SentDebias—exhibit limited cross-lingual transferability and fail to consistently mitigate bias across languages. Method: We propose performing debiasing within a unified, semantically aligned cross-lingual latent space, rather than operating directly on raw model representations. Specifically, we train a cross-lingual autoencoder on parallel TED corpora to construct this aligned latent space, and integrate Aya-Expanse with multiple debiasing techniques therein. Contribution/Results: Experiments across four languages demonstrate: (1) strong cross-lingual alignment in the learned latent space; (2) significantly improved debiasing performance over baselines; and (3) effective cross-lingual generalization of bias mitigation. To our knowledge, this is the first work to jointly optimize latent-space alignment and debiasing, thereby enhancing both generalizability and consistency in multilingual fairness modeling.

Technology Category

Application Category

📝 Abstract

Debiasing techniques such as SentDebias aim to reduce bias in large language models (LLMs). Previous studies have evaluated their cross-lingual transferability by directly applying these methods to LLM representations, revealing their limited effectiveness across languages. In this work, we therefore propose to perform debiasing in a joint latent space rather than directly on LLM representations. We construct a well-aligned cross-lingual latent space using an autoencoder trained on parallel TED talk scripts. Our experiments with Aya-expanse and two debiasing techniques across four languages (English, French, German, Dutch) demonstrate that a) autoencoders effectively construct a well-aligned cross-lingual latent space, and b) applying debiasing techniques in the learned cross-lingual latent space significantly improves both the overall debiasing performance and cross-lingual transferability.

Problem

Research questions and friction points this paper is trying to address.

Debiasing multilingual LLMs across languages effectively

Improving cross-lingual transferability of debiasing techniques

Constructing aligned cross-lingual latent space for bias reduction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Debiasing in joint cross-lingual latent space

Autoencoder trained on parallel TED scripts

Applying debiasing techniques in aligned latent space

🔎 Similar Papers

Learn and Unlearn in Multilingual LLMs