Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

📅 2025-02-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the tension between high computational costs and performance demands in machine translation, this paper proposes a quality-estimation (QE)-guided cascaded inference framework: a lightweight model serves as the default translator, while a large model is dynamically invoked only when off-the-shelf, training-free, and interpretable QE metrics predict translation quality below a predefined threshold. This work is the first to directly employ plug-and-play QE as an explicit routing criterion—eliminating the need for auxiliary training or opaque decision-making. Evaluated across multilingual translation tasks, the system achieves accuracy comparable to that of the large model alone, while invoking the large model for only 30%–50% of inputs, yielding substantial computational savings. Rigorous validation via both automated metrics and human evaluation confirms its superior efficiency–accuracy trade-off.

Technology Category

Application Category

📝 Abstract
Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.
Problem

Research questions and friction points this paper is trying to address.

Reduce computational costs in translation
Use quality estimation for deferral rules
Match larger model performance efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascaded Translation Systems
Quality Estimation Metrics
Computational Cost Reduction
🔎 Similar Papers
No similar papers found.
António Farinhas
António Farinhas
Sword Health
Machine LearningNatural Language Processing
N
Nuno M. Guerreiro
Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, MICS, CentraleSupélec, Université Paris-Saclay
Sweta Agrawal
Sweta Agrawal
Research Scientist at Google
Machine TranslationNatural Language Generation and Evaluation
Ricardo Rei
Ricardo Rei
Sword Health
Healthcare AIMachine LearningNatural Language ProcessingLarge Language Models
A
André F.T. Martins
Instituto de Telecomunicações, Instituto Superior Técnico, Universidade de Lisboa, Unbabel, ELLIS Unit Lisbon