Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of cross-lingual factual inconsistency in large language models, which often struggle to express factual knowledge coherently across non-English languages. To tackle this issue, the authors introduce PolyFact, a large-scale multilingual parallel fact-based question-answering dataset, and develop a consistency-driven reinforcement learning framework based on Group Relative Policy Optimization (GRPO). Combining lightweight continued pretraining and supervised fine-tuning on models such as Qwen-2.5-7B and OLMo-2-1124-7B, GRPO substantially outperforms conventional supervised fine-tuning, enhancing both cross-lingual factual recall consistency and generalization to unseen languages across twelve typologically diverse languages. Analysis reveals that GRPO promotes shared representations by suppressing language-specific activations in MLP layers and attention heads. The code, models, and PolyFact dataset are publicly released.

📝 Abstract

Large language models (LLMs) trained predominantly on English data encode substantial world knowledge, yet often fail to express it reliably in other languages, a phenomenon known as cross-lingual factual inconsistency. To study and address this, we introduce PolyFact, a large-scale parallel multilingual factual QA dataset containing 100K Wikidata-grounded facts across 12 typologically diverse languages. Using PolyFact, we compare light continual pretraining (CPT), supervised fine-tuning (SFT), and reinforcement learning via Group Relative Policy Optimization (GRPO) for improving cross-lingual factual recall in Qwen-2.5-7B and OLMo-2-1124-7B. We find that GRPO consistently outperforms SFT, improving both cross-lingual consistency and generalization to unseen languages, while CPT on parallel data yields limited additional gains. Mechanistic analyses further show that GRPO reorganizes multilingual routing by reducing language specialization in MLP layers and attention heads, thereby promoting more shared cross-lingual representations. We release our code, models, and dataset.

Problem

Research questions and friction points this paper is trying to address.

cross-lingual factual inconsistency

factual recall

multilingual language models

language generalization

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual factual consistency

reinforcement learning

GRPO

multilingual representation