Two Intermediate Translations Are Better Than One: Fine-tuning LLMs for Document-level Translation Refinement

📅 2025-04-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the fragmentation between sentence-level and document-level modeling in document-level machine translation, which leads to inconsistent coherence and terminology due to heterogeneous intermediate translation quality. To tackle this, we propose a Dual Intermediate Translation Collaborative Refinement framework. It is the first to jointly leverage sentence-to-sentence (Sent2Sent) and document-to-document (Doc2Doc) intermediate translations to guide large language model (LLM) fine-tuning. We further introduce a quality-aware adaptive loss weighting mechanism that dynamically emphasizes hard-to-translate samples. Supervised fine-tuning is conducted on LLaMA-3-8B-Instruct and Mistral-Nemo-Instruct, augmented by a dedicated translation quality assessment module for granular optimization. Evaluated across ten cross-lingual document translation tasks, our method significantly outperforms single-level refinement baselines, achieving substantial gains in document coherence and terminology consistency—demonstrating both the efficacy and generalizability of collaborative refinement.

Technology Category

Application Category

📝 Abstract
Recent research has shown that large language models (LLMs) can enhance translation quality through self-refinement. In this paper, we build on this idea by extending the refinement from sentence-level to document-level translation, specifically focusing on document-to-document (Doc2Doc) translation refinement. Since sentence-to-sentence (Sent2Sent) and Doc2Doc translation address different aspects of the translation process, we propose fine-tuning LLMs for translation refinement using two intermediate translations, combining the strengths of both Sent2Sent and Doc2Doc. Additionally, recognizing that the quality of intermediate translations varies, we introduce an enhanced fine-tuning method with quality awareness that assigns lower weights to easier translations and higher weights to more difficult ones, enabling the model to focus on challenging translation cases. Experimental results across ten translation tasks with LLaMA-3-8B-Instruct and Mistral-Nemo-Instruct demonstrate the effectiveness of our approach.
Problem

Research questions and friction points this paper is trying to address.

Extending translation refinement from sentence-level to document-level
Combining strengths of Sent2Sent and Doc2Doc translation approaches
Enhancing fine-tuning with quality-aware weighting for difficult translations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning LLMs for Doc2Doc translation refinement
Using two intermediate translations combining Sent2Sent and Doc2Doc
Enhanced fine-tuning with quality-aware weighting
🔎 Similar Papers
No similar papers found.
Y
Yichen Dong
School of Computer Science and Technology, Soochow University, Suzhou, China
Xinglin Lyu
Xinglin Lyu
PhD Student of Software Engineering, Soochow University
Machine TranslationNatural Language Processing
J
Junhui Li
School of Computer Science and Technology, Soochow University, Suzhou, China
D
Daimeng Wei
Huawei Translation Services Center, Beijing, China
M
Min Zhang
Huawei Translation Services Center, Beijing, China
Shimin Tao
Shimin Tao
2012 Lab, Huawei co. LTD
Machine Translation AIOps Log Analysis
H
Hao Yang
Huawei Translation Services Center, Beijing, China