Improving Cross-Lingual Factual Recall via Consistency-Driven Reinforcement Learning

📅 2026-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of cross-lingual factual inconsistency in large language models, which often struggle to express factual knowledge coherently across non-English languages. To tackle this issue, the authors introduce PolyFact, a large-scale multilingual parallel fact-based question-answering dataset, and develop a consistency-driven reinforcement learning framework based on Group Relative Policy Optimization (GRPO). Combining lightweight continued pretraining and supervised fine-tuning on models such as Qwen-2.5-7B and OLMo-2-1124-7B, GRPO substantially outperforms conventional supervised fine-tuning, enhancing both cross-lingual factual recall consistency and generalization to unseen languages across twelve typologically diverse languages. Analysis reveals that GRPO promotes shared representations by suppressing language-specific activations in MLP layers and attention heads. The code, models, and PolyFact dataset are publicly released.
📝 Abstract
Large language models (LLMs) trained predominantly on English data encode substantial world knowledge, yet often fail to express it reliably in other languages, a phenomenon known as cross-lingual factual inconsistency. To study and address this, we introduce PolyFact, a large-scale parallel multilingual factual QA dataset containing 100K Wikidata-grounded facts across 12 typologically diverse languages. Using PolyFact, we compare light continual pretraining (CPT), supervised fine-tuning (SFT), and reinforcement learning via Group Relative Policy Optimization (GRPO) for improving cross-lingual factual recall in Qwen-2.5-7B and OLMo-2-1124-7B. We find that GRPO consistently outperforms SFT, improving both cross-lingual consistency and generalization to unseen languages, while CPT on parallel data yields limited additional gains. Mechanistic analyses further show that GRPO reorganizes multilingual routing by reducing language specialization in MLP layers and attention heads, thereby promoting more shared cross-lingual representations. We release our code, models, and dataset.
Problem

Research questions and friction points this paper is trying to address.

cross-lingual factual inconsistency
factual recall
multilingual language models
language generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-lingual factual consistency
reinforcement learning
GRPO
multilingual representation
factual recall
🔎 Similar Papers
2024-06-20International Conference on Computational LinguisticsCitations: 2
J
Jonathan von Rad
University College London, Centre for Artificial Intelligence
L
Louis Arts
University College London, Centre for Artificial Intelligence
G
George Burgess
University College London, Centre for Artificial Intelligence
E
Eleftheria Kolokytha
University College London, Centre for Artificial Intelligence
H
Harry O'Donnell
University College London, Centre for Artificial Intelligence
E
Ektor Oikonomidis Doumpas
University College London, Centre for Artificial Intelligence
E
Eduardo Sanchez
University College London, Centre for Artificial Intelligence
Yao Lu
Yao Lu
Assistant Professor @ University College London
Natural Language Processing
P
Pontus Stenetorp
University College London, Centre for Artificial Intelligence