🤖 AI Summary
This work addresses the challenge of cross-lingual factual inconsistency in large language models, which often struggle to express factual knowledge coherently across non-English languages. To tackle this issue, the authors introduce PolyFact, a large-scale multilingual parallel fact-based question-answering dataset, and develop a consistency-driven reinforcement learning framework based on Group Relative Policy Optimization (GRPO). Combining lightweight continued pretraining and supervised fine-tuning on models such as Qwen-2.5-7B and OLMo-2-1124-7B, GRPO substantially outperforms conventional supervised fine-tuning, enhancing both cross-lingual factual recall consistency and generalization to unseen languages across twelve typologically diverse languages. Analysis reveals that GRPO promotes shared representations by suppressing language-specific activations in MLP layers and attention heads. The code, models, and PolyFact dataset are publicly released.
📝 Abstract
Large language models (LLMs) trained predominantly on English data encode substantial world knowledge, yet often fail to express it reliably in other languages, a phenomenon known as cross-lingual factual inconsistency. To study and address this, we introduce PolyFact, a large-scale parallel multilingual factual QA dataset containing 100K Wikidata-grounded facts across 12 typologically diverse languages. Using PolyFact, we compare light continual pretraining (CPT), supervised fine-tuning (SFT), and reinforcement learning via Group Relative Policy Optimization (GRPO) for improving cross-lingual factual recall in Qwen-2.5-7B and OLMo-2-1124-7B. We find that GRPO consistently outperforms SFT, improving both cross-lingual consistency and generalization to unseen languages, while CPT on parallel data yields limited additional gains. Mechanistic analyses further show that GRPO reorganizes multilingual routing by reducing language specialization in MLP layers and attention heads, thereby promoting more shared cross-lingual representations. We release our code, models, and dataset.