Translate Smart, not Hard: Cascaded Translation Systems with Quality-Aware Deferral

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

To address the tension between high computational costs and performance demands in machine translation, this paper proposes a quality-estimation (QE)-guided cascaded inference framework: a lightweight model serves as the default translator, while a large model is dynamically invoked only when off-the-shelf, training-free, and interpretable QE metrics predict translation quality below a predefined threshold. This work is the first to directly employ plug-and-play QE as an explicit routing criterion—eliminating the need for auxiliary training or opaque decision-making. Evaluated across multilingual translation tasks, the system achieves accuracy comparable to that of the large model alone, while invoking the large model for only 30%–50% of inputs, yielding substantial computational savings. Rigorous validation via both automated metrics and human evaluation confirms its superior efficiency–accuracy trade-off.

Technology Category

Application Category

📝 Abstract

Larger models often outperform smaller ones but come with high computational costs. Cascading offers a potential solution. By default, it uses smaller models and defers only some instances to larger, more powerful models. However, designing effective deferral rules remains a challenge. In this paper, we propose a simple yet effective approach for machine translation, using existing quality estimation (QE) metrics as deferral rules. We show that QE-based deferral allows a cascaded system to match the performance of a larger model while invoking it for a small fraction (30% to 50%) of the examples, significantly reducing computational costs. We validate this approach through both automatic and human evaluation.

Problem

Research questions and friction points this paper is trying to address.

Reduce computational costs in translation

Use quality estimation for deferral rules

Match larger model performance efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cascaded Translation Systems

Quality Estimation Metrics

Computational Cost Reduction

🔎 Similar Papers

No similar papers found.