Alleviating Distribution Shift in Synthetic Data for Machine Translation Quality Estimation

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Synthetic data in Machine Translation Quality Estimation (QE) suffers from distributional shift—mismatch between pseudo-translations and authentic translations, and misalignment between pseudo-labels and human preferences. Method: We propose ADSQE, a novel framework that integrates constrained beam search, multi-model collaborative generation, reference-guided word-level annotation, and error-propagation-based phrase-level label inference. Crucially, it prohibits translation models from self-evaluating their outputs to avoid circular bias. Contribution/Results: ADSQE is the first to leverage reference translations to guide both synthetic data generation and fine-grained annotation, and introduces a shortest-error-phrase identification mechanism aligned with human annotator behavior. Experiments demonstrate that ADSQE consistently outperforms state-of-the-art methods—including COMET—on both supervised and unsupervised QE benchmarks. Moreover, it significantly enhances the efficacy of synthetic data for reward model training.

Technology Category

Application Category

📝 Abstract
Quality Estimation (QE) models evaluate the quality of machine translations without reference translations, serving as the reward models for the translation task. Due to the data scarcity, synthetic data generation has emerged as a promising solution. However, synthetic QE data often suffers from distribution shift, which can manifest as discrepancies between pseudo and real translations, or in pseudo labels that do not align with human preferences. To tackle this issue, we introduce ADSQE, a novel framework for alleviating distribution shift in synthetic QE data. To reduce the difference between pseudo and real translations, we employ the constrained beam search algorithm and enhance translation diversity through the use of distinct generation models. ADSQE uses references, i.e., translation supervision signals, to guide both the generation and annotation processes, enhancing the quality of word-level labels. ADSE further identifies the shortest phrase covering consecutive error tokens, mimicking human annotation behavior, to assign the final phrase-level labels. Specially, we underscore that the translation model can not annotate translations of itself accurately. Extensive experiments demonstrate that ADSQE outperforms SOTA baselines like COMET in both supervised and unsupervised settings. Further analysis offers insights into synthetic data generation that could benefit reward models for other tasks.
Problem

Research questions and friction points this paper is trying to address.

Alleviating distribution shift in synthetic data
Improving machine translation quality estimation
Enhancing synthetic data generation and annotation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Constrained beam search algorithm
Translation supervision signal guidance
Shortest phrase error identification
🔎 Similar Papers
No similar papers found.
X
Xiang Geng
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
Zhejian Lai
Zhejian Lai
Master student of Nanjing University
自然语言处理
J
Jiajun Chen
National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, China
H
Hao Yang
Huawei Translation Services Center, Beijing, China
Shujian Huang
Shujian Huang
School of Computer Science, Nanjing University
Natural Language ProcessingMachine TranslationMultilingualismLarge Language Models