Parallel Scaling Law: Unveiling Reasoning Generalization through A Cross-Linguistic Perspective

📅 2025-10-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study investigates the cross-lingual reasoning generalization of large reasoning models (LRMs) trained via English-reinforced process-based training (RPT). We propose the first systematic framework for cross-lingual reasoning generalization, comprising a multilingual benchmark, controlled intervention experiments, and a parallel training paradigm; we further introduce the “monolingual generalization gap” and a novel metric quantifying cross-lingual transferability. Key findings include: (1) a “first-parallel-language transition” phenomenon, wherein introducing even one non-English parallel language yields disproportionate gains in cross-lingual reasoning; and (2) a parallel scaling law—cross-lingual reasoning performance improves as a power-law function of the number of parallel languages. Empirical results show that English-strong LRMs overfit English-specific patterns, harming cross-lingual generalization; in contrast, parallel training significantly enhances multilingual reasoning capabilities, revealing critical moderating roles of the base model, target language, and training paradigm.

Technology Category

Application Category

📝 Abstract
Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: $ extit{Does the reasoning capability achieved from English RPT effectively transfer to other languages?}$ We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: $ extbf{First-Parallel Leap}$, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable $ extbf{Parallel Scaling Law}$, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as $ extbf{Monolingual Generalization Gap}$, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.
Problem

Research questions and friction points this paper is trying to address.

Investigating cross-lingual transfer of reasoning capabilities from English RPT
Quantifying transferability across languages, models, and training paradigms
Addressing monolingual generalization gap through parallel training scaling laws
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-linguistic evaluation of reasoning generalization in models
Parallel training study revealing power-law scaling performance
Quantifying monolingual generalization gap through multilingual benchmarks
🔎 Similar Papers
No similar papers found.
W
Wen Yang
School of Artificial Intelligence, University of Chinese Academy of Sciences; Institute of Automation, Chinese Academy of Sciences
Junhong Wu
Junhong Wu
PhD student, Institute of Automation, Chinese Academy of Sciences
Natural language processinglifelong learning
C
Chong Li
School of Artificial Intelligence, University of Chinese Academy of Sciences; Institute of Automation, Chinese Academy of Sciences
C
Chengqing Zong
School of Artificial Intelligence, University of Chinese Academy of Sciences; Institute of Automation, Chinese Academy of Sciences
Jiajun Zhang
Jiajun Zhang
Institute of Automation Chinese Academy of Sciences
Natural Language ProcessingLarge Language ModelsMultimodal Information Processing