EBBS: An Ensemble with Bi-Level Beam Search for Zero-Shot Machine Translation

📅 2024-02-29
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF
🤖 AI Summary
Zero-shot neural machine translation (NMT) suffers from substantial noise and suboptimal performance under both direct translation and English-centric pivot strategies. To address this, we propose a two-level beam search ensemble framework: at the lower level, multiple pre-trained multilingual models decode in parallel and independently; at the upper level, path-level dynamic collaboration is achieved via soft voting over candidate hypotheses. Furthermore, we introduce the first lossless knowledge distillation of the ensemble’s collective output back into the original multilingual model—preserving inference efficiency while enhancing translation quality. This work pioneers the integration of hierarchical beam search with ensemble knowledge distillation. Empirical evaluation on OPUS-100 and Tatoeba benchmarks demonstrates state-of-the-art performance, significantly outperforming direct translation, pivot-based approaches, and existing ensemble baselines. Notably, the distilled model achieves faster inference speed without BLEU degradation—indeed, BLEU scores improve.

Technology Category

Application Category

📝 Abstract
The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions. Alternatively, zero-shot translation can be accomplished by pivoting through a third language (e.g., English). In our work, we observe that both direct and pivot translations are noisy and achieve less satisfactory performance. We propose EBBS, an ensemble method with a novel bi-level beam search algorithm, where each ensemble component explores its own prediction step by step at the lower level but they are synchronized by a"soft voting"mechanism at the upper level. Results on two popular multilingual translation datasets show that EBBS consistently outperforms direct and pivot translations as well as existing ensemble techniques. Further, we can distill the ensemble's knowledge back to the multilingual model to improve inference efficiency; profoundly, our EBBS-based distillation does not sacrifice, or even improves, the translation quality.
Problem

Research questions and friction points this paper is trying to address.

Improves zero-shot translation accuracy
Reduces noise in direct and pivot translations
Enhances multilingual model efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

Ensemble method
Bi-level beam search
Soft voting mechanism
🔎 Similar Papers
No similar papers found.
Y
Yuqiao Wen
Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta
B
Behzad Shayegh
Dept. Computing Science, Alberta Machine Intelligence Institute (Amii), University of Alberta
Chenyang Huang
Chenyang Huang
Ph.D. Student, University of Alberta
MLDLNLPCV
Yanshuai Cao
Yanshuai Cao
Borealis AI
Artificial IntelligenceMachine LearningGenerative ModelsNatural Language ProcessingComputer Vision
Lili Mou
Lili Mou
University of Alberta
Natural Language ProcessingMachine Learning