Learning When to Translate for Multilingual Reasoning

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

161K/year

🤖 AI Summary

This work addresses the performance gap in multilingual reasoning caused by insufficient comprehension of non-English inputs by proposing Luar, a novel framework that dynamically decides whether to invoke English translation based on the model’s estimated reliability in understanding the input language. Luar introduces a selective translation mechanism powered by boundary-aware reinforcement learning and GRPO optimization, enabling the reasoning language model to avoid unnecessary translations while effectively generalizing to unseen low-resource languages. Experimental results demonstrate that Luar significantly outperforms standard GRPO and other baselines on multilingual reasoning benchmarks, with particularly notable gains on low-resource languages.

📝 Abstract

Reasoning language models (RLMs) achieve strong performance on complex reasoning tasks, but still exhibit substantial multilingual reasoning gaps, largely due to language-understanding failures in non-English inputs. English translation can mitigate these failures by expressing non-English inputs in a form that RLMs can more reliably interpret, yet translating every input is unnecessary when the model can reason reliably from the original query. To address this challenge, we propose Luar, a Language Understanding Boundary-aware Reinforcement Learning framework that trains RLMs to selectively invoke translation when direct understanding is unreliable. Luar trains the model to choose between solving the original input directly and reasoning over its English translation, encouraging translation only when translator-augmented reasoning is expected to substantially outperform direct reasoning. Across multilingual reasoning benchmarks, Luar outperforms standard GRPO and other training-based baselines, with particularly large gains on low-resource languages. Further analysis shows that Luar avoids unnecessary translation in cases where direct reasoning is sufficient, while extending its translator-call behavior to unseen low-resource languages. Together, our work suggests a selective approach to multilingual reasoning: RLMs can learn to invoke translation only when their direct understanding is unreliable. The project will be made publicly available at https://github.com/deokhk/LUAR

Problem

Research questions and friction points this paper is trying to address.

multilingual reasoning

translation

language understanding

reasoning language models

low-resource languages

Innovation

Methods, ideas, or system contributions that make the work stand out.

selective translation

multilingual reasoning

reinforcement learning