R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) for machine translation suffer from deficient reasoning capabilities; existing chain-of-thought (CoT) methods rely on rigid templates, lack human alignment, and supervised fine-tuning (SFT) induces catastrophic forgetting. Method: We propose a reasoning-driven zero-shot cross-lingual transfer framework that introduces structured, multi-level human translation strategies—formalized as six expert-defined CoT templates—into general-purpose translation. Our approach integrates KL-constrained reinforcement learning with multi-stage reasoning distillation to jointly achieve human-aligned CoT modeling and autonomous CoT discovery. Results: Evaluated on Flores-101 across 21 languages and 80 translation directions, our method significantly improves translation quality—especially for 15 unseen languages—outperforming standard SFT in cross-lingual generalization. It demonstrates robust performance across specialized domains including law and healthcare.

Technology Category

Application Category

📝 Abstract
Despite recent breakthroughs in reasoning-enhanced large language models (LLMs) like DeepSeek-R1, incorporating inference-time reasoning into machine translation (MT), where human translators naturally employ structured, multi-layered reasoning chain-of-thoughts (CoTs), is yet underexplored. Existing methods either design a fixed CoT tailored for a specific MT sub-task (e.g., literature translation), or rely on synthesizing CoTs unaligned with humans and supervised fine-tuning (SFT) prone to catastrophic forgetting, limiting their adaptability to diverse translation scenarios. This paper introduces R1-Translator (R1-T1), a novel framework to achieve inference-time reasoning for general MT via reinforcement learning (RL) with human-aligned CoTs comprising six common patterns. Our approach pioneers three innovations: (1) extending reasoning-based translation beyond MT sub-tasks to six languages and diverse tasks (e.g., legal/medical domain adaptation, idiom resolution); (2) formalizing six expert-curated CoT templates that mirror hybrid human strategies like context-aware paraphrasing and back translation; and (3) enabling self-evolving CoT discovery and anti-forgetting adaptation through RL with KL-constrained rewards. Experimental results indicate a steady translation performance improvement in 21 languages and 80 translation directions on Flores-101 test set, especially on the 15 languages unseen from training, with its general multilingual abilities preserved compared with plain SFT.
Problem

Research questions and friction points this paper is trying to address.

Enhancing translation in LLMs via reasoning learning.
Addressing adaptability in diverse translation scenarios.
Improving multilingual translation using human-aligned CoTs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reinforcement learning with human-aligned CoTs
Self-evolving CoT discovery via RL
Multilingual translation across diverse tasks
🔎 Similar Papers
No similar papers found.
M
Minggui He
Huawei, China
Y
Yilun Liu
Huawei, China
Shimin Tao
Shimin Tao
2012 Lab, Huawei co. LTD
Machine Translation AIOps Log Analysis
Yuanchang Luo
Yuanchang Luo
2012@Huawei
H
Hongyong Zeng
Huawei, China
C
Chang Su
Huawei, China
L
Li Zhang
Huawei, China
H
Hongxia Ma
Huawei, China
D
Daimeng Wei
Huawei, China
W
Weibin Meng
Huawei, China
H
Hao Yang
Huawei, China
Boxing Chen
Boxing Chen
Huawei Technologies Canada
Natual Language ProcessingArtificial Intelligence
Osamu Yoshie
Osamu Yoshie
waseda university