🤖 AI Summary
This study investigates the cross-lingual reasoning generalization of large reasoning models (LRMs) trained via English-reinforced process-based training (RPT). We propose the first systematic framework for cross-lingual reasoning generalization, comprising a multilingual benchmark, controlled intervention experiments, and a parallel training paradigm; we further introduce the “monolingual generalization gap” and a novel metric quantifying cross-lingual transferability. Key findings include: (1) a “first-parallel-language transition” phenomenon, wherein introducing even one non-English parallel language yields disproportionate gains in cross-lingual reasoning; and (2) a parallel scaling law—cross-lingual reasoning performance improves as a power-law function of the number of parallel languages. Empirical results show that English-strong LRMs overfit English-specific patterns, harming cross-lingual generalization; in contrast, parallel training significantly enhances multilingual reasoning capabilities, revealing critical moderating roles of the base model, target language, and training paradigm.
📝 Abstract
Recent advancements in Reinforcement Post-Training (RPT) have significantly enhanced the capabilities of Large Reasoning Models (LRMs), sparking increased interest in the generalization of RL-based reasoning. While existing work has primarily focused on investigating its generalization across tasks or modalities, this study proposes a novel cross-linguistic perspective to investigate reasoning generalization. This raises a crucial question: $ extit{Does the reasoning capability achieved from English RPT effectively transfer to other languages?}$ We address this by systematically evaluating English-centric LRMs on multilingual reasoning benchmarks and introducing a metric to quantify cross-lingual transferability. Our findings reveal that cross-lingual transferability varies significantly across initial model, target language, and training paradigm. Through interventional studies, we find that models with stronger initial English capabilities tend to over-rely on English-specific patterns, leading to diminished cross-lingual generalization. To address this, we conduct a thorough parallel training study. Experimental results yield three key findings: $ extbf{First-Parallel Leap}$, a substantial leap in performance when transitioning from monolingual to just a single parallel language, and a predictable $ extbf{Parallel Scaling Law}$, revealing that cross-lingual reasoning transfer follows a power-law with the number of training parallel languages. Moreover, we identify the discrepancy between actual monolingual performance and the power-law prediction as $ extbf{Monolingual Generalization Gap}$, indicating that English-centric LRMs fail to fully generalize across languages. Our study challenges the assumption that LRM reasoning mirrors human cognition, providing critical insights for the development of more language-agnostic LRMs.