Exposing the Cracks: Vulnerabilities of Retrieval-Augmented LLM-based Machine Translation

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the robustness deficiencies of retrieval-augmented large language model machine translation (REAL-MT) under noisy retrieval conditions, particularly exposing severe performance degradation in idiom translation for low-resource languages due to overreliance on unreliable retrieved context. We propose a noise synthesis framework and a novel robustness evaluation metric. Our analysis reveals that stronger reasoning capabilities exacerbate sensitivity to retrieval noise—manifesting as attention misalignment and confidence miscalibration. Through retrieval-augmented generation, fine-grained attention analysis, training-free interventions, and targeted fine-tuning, we systematically evaluate Qwen-series models across diverse language pairs and resource levels. Results confirm more pronounced degradation in low-resource settings and a tendency toward rationalizing—not correcting—noisy inputs. We establish, for the first time, a fundamental trade-off between translation performance and robustness in REAL-MT, providing both theoretical insights and practical guidelines for its reliable deployment.

Technology Category

Application Category

📝 Abstract
extbf{RE}trieval- extbf{A}ugmented extbf{L}LM-based extbf{M}achine extbf{T}ranslation (REAL-MT) shows promise for knowledge-intensive tasks like idiomatic translation, but its reliability under noisy retrieval contexts remains poorly understood despite this being a common challenge in real-world deployment. To address this gap, we propose a noise synthesis framework and new metrics to evaluate the robustness of REAL-MT systematically. Using this framework, we instantiate REAL-MT with Qwen-series models, including standard LLMs and large reasoning models (LRMs) with enhanced reasoning, and evaluate their performance on idiomatic translation across high-, medium-, and low-resource language pairs under synthesized noise. Our results show that low-resource language pairs, which rely more heavily on retrieved context, degrade more severely under noise than high-resource ones and often produce nonsensical translations. Although LRMs possess enhanced reasoning capabilities, they show no improvement in error correction and are even more susceptible to noise, tending to rationalize incorrect contexts. We find that this stems from an attention shift away from the source idiom to noisy content, while confidence increases despite declining accuracy, indicating poor calibration. To mitigate these issues, we investigate training-free and fine-tuning strategies, which improve robustness at the cost of performance in clean contexts, revealing a fundamental trade-off. Our findings highlight the limitations of current approaches, underscoring the need for self-verifying integration mechanisms.
Problem

Research questions and friction points this paper is trying to address.

Evaluating REAL-MT robustness under noisy retrieval contexts for translation
Analyzing performance degradation in low-resource language pairs with noise
Investigating mitigation strategies for noise-induced translation errors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Noise synthesis framework evaluates REAL-MT robustness
Training-free and fine-tuning strategies mitigate noise impact
Attention shift analysis reveals poor calibration in models
🔎 Similar Papers
No similar papers found.
Y
Yanming Sun
NLP2CT Lab, University of Macau
Runzhe Zhan
Runzhe Zhan
Ph.D. Candidate, University of Macau
Machine TranslationLanguage ModelsMultilinguality
C
Chi Seng Cheang
NLP2CT Lab, University of Macau
H
Han Wu
NLP2CT Lab, University of Macau
X
Xuebo Liu
Harbin Institute of Technology, Shenzhen
Y
Yuyao Niu
School of Foreign Languages, South China University of Technology
F
Fengying Ye
NLP2CT Lab, University of Macau
K
Kaixin Lan
NLP2CT Lab, University of Macau
Lidia S. Chao
Lidia S. Chao
University of Macau
Derek F. Wong
Derek F. Wong
Professor, Department of Computer and Information Science, University of Macau
Machine TranslationNeural Machine TranslationNatural Language ProcessingMachine Learning