Doc-Guided Sent2Sent++: A Sent2Sent++ Agent with Doc-Guided memory for Document-level Machine Translation

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of jointly optimizing translation quality, cross-sentence consistency, and fluency in document-level machine translation (DocMT). To this end, we propose an incremental sentence-level constrained decoding agent framework. Our key contributions are: (1) Doc-Guided Memory—a novel memory mechanism that models inter-sentential consistency solely via the document summary and its translation, drastically reducing memory overhead; and (2) Sent2Sent++—a dynamic decoding paradigm that integrates preceding sentence context and document-level summary information during incremental sentence generation, overcoming the limitations of Doc2Doc (loss of fine-grained sentence-level dependencies) and Doc2Sent (weakened contextual coherence). The framework incorporates a lightweight summary-driven memory module within an LLM-based agent architecture and employs document-level evaluation metrics—including s-COMET, d-COMET, LTCR-1f, and d-ppl. Extensive multilingual, multi-domain experiments demonstrate consistent superiority over state-of-the-art DocMT methods, achieving gains of +2.1 in s-COMET, +3.4 in d-COMET, and −18.7% in d-ppl.

Technology Category

Application Category

📝 Abstract
The field of artificial intelligence has witnessed significant advancements in natural language processing, largely attributed to the capabilities of Large Language Models (LLMs). These models form the backbone of Agents designed to address long-context dependencies, particularly in Document-level Machine Translation (DocMT). DocMT presents unique challenges, with quality, consistency, and fluency being the key metrics for evaluation. Existing approaches, such as Doc2Doc and Doc2Sent, either omit sentences or compromise fluency. This paper introduces Doc-Guided Sent2Sent++, an Agent that employs an incremental sentence-level forced decoding strategy extbf{to ensure every sentence is translated while enhancing the fluency of adjacent sentences.} Our Agent leverages a Doc-Guided Memory, focusing solely on the summary and its translation, which we find to be an efficient approach to maintaining consistency. Through extensive testing across multiple languages and domains, we demonstrate that Sent2Sent++ outperforms other methods in terms of quality, consistency, and fluency. The results indicate that, our approach has achieved significant improvements in metrics such as s-COMET, d-COMET, LTCR-$1_f$, and document-level perplexity (d-ppl). The contributions of this paper include a detailed analysis of current DocMT research, the introduction of the Sent2Sent++ decoding method, the Doc-Guided Memory mechanism, and validation of its effectiveness across languages and domains.
Problem

Research questions and friction points this paper is trying to address.

Translation Quality
Consistency with Original
Fluency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Document-Guided Translation
Long Document Integrity
Cross-Lingual Adaptability
🔎 Similar Papers
No similar papers found.
J
Jiaxin Guo
Huawei Translation Services Center, Beijing, China
Yuanchang Luo
Yuanchang Luo
2012@Huawei
D
Daimeng Wei
Huawei Translation Services Center, Beijing, China
Ling Zhang
Ling Zhang
Alibaba DAMO Academy USA
Medical Image AnalysisMedical Image ComputingMachine LearningImage Processing
Zongyao Li
Zongyao Li
Huawei Translation Services Center, Beijing, China
H
Hengchao Shang
Huawei Translation Services Center, Beijing, China
Zhiqiang Rao
Zhiqiang Rao
Huawei
NLP
Shaojun Li
Shaojun Li
Engineer, 2012 Lab, Huawei Co. LTD
J
Jinlong Yang
Huawei Translation Services Center, Beijing, China
Zhanglin Wu
Zhanglin Wu
2012 Lab, Huawei Co. LTD
Machine TranslationNatural Language Processing
H
Hao Yang
Huawei Translation Services Center, Beijing, China