Unlocking Fine-Grained Translation Quality Estimation in LRMs through Synergistically Evolving Implicit and Explicit Reasoning

๐Ÿ“… 2026-05-29
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

221K/year
๐Ÿค– AI Summary
This work addresses the limitations of large reasoning models (LRMs) in fine-grained machine translation quality estimation (QE), a task hindered by its high complexity. The authors propose RIEQE, a novel framework that jointly optimizes implicit (inter-layer) and explicit (token-level) reasoning capabilities for the first time. RIEQE first enhances implicit reasoning through chain-of-thoughtโ€“free supervised fine-tuning (NonThinking-SFT), followed by reinforcement learning with verifiable rewards (Thinking-RLVR) to refine explicit reasoning. By decomposing the complex QE task into learnable subtasks, the framework reveals a mutually reinforcing mechanism between the two reasoning modes. Evaluated on WMT test sets using Qwen3-4B-Thinking-2507, RIEQE outperforms all baselines in explicit reasoning performance while achieving implicit reasoning capabilities on par with state-of-the-art encoder-based models.
๐Ÿ“ Abstract
Large Reasoning Models (LRMs) still struggle with fine-grained translation quality estimation (QE), even with long reasoning chains. We argue that LRMs already possess strong multilingual capabilities, while the core challenge stems from the intrinsic difficulty of learning the fine-grained QE task. In this paper, we propose RIEQE (Reasoning both Implicitly and Explicitly for QE), a simple two-stage training framework that enables the co-evolution of implicit (layer-wise) and explicit (token-wise) reasoning capabilities. To make implicit reasoning feasible, we first decompose the complex QE task into straightforward subtasks. Based on this, our two-stage approach applies: (1) NonThinking-SFT, Supervised Fine-Tuning (SFT) without reasoning chains to directly boost the model's implicit reasoning tendency and capability; and (2) Thinking-RLVR, standard Reinforcement Learning with Verifiable Reward (RLVR) to subsequently strengthen explicit reasoning. Results demonstrate that implicit and explicit reasoning synergistically co-evolve under our framework. On the WMT test sets, RIEQE based on Qwen3-4B-Thinking-2507 surpasses all baselines in explicit reasoning performance, while its implicit reasoning capability is also comparable to the best current encoder-based models. We further provide evidence for the synergistic collaboration between implicit and explicit reasoning, showing how they mutually benefit each other.
Problem

Research questions and friction points this paper is trying to address.

translation quality estimation
fine-grained evaluation
Large Reasoning Models
implicit reasoning
explicit reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

fine-grained translation quality estimation
implicit reasoning
explicit reasoning
two-stage training
synergistic co-evolution
๐Ÿ”Ž Similar Papers
No similar papers found.