π€ AI Summary
Existing non-autoregressive multilingual neural machine translation (MNMT) heavily relies on computationally expensive knowledge distillation (KD) to achieve competitive performance, hindering efficiency and scalability. This paper proposes M-DAT, the first KD-free framework for non-autoregressive multilingual translation. Built upon the Directed Acyclic Transformer (DAT) architecture, M-DAT integrates multilingual joint training with a novel pivot back-translation (PivotBT) strategy to explicitly model latent cross-lingual alignments, thereby substantially improving zero-shot generalization to unseen language directions. Evaluated on standard multilingual benchmarks, M-DAT achieves state-of-the-art performance among non-autoregressive models: it attains a 3.2Γ speedup over autoregressive baselines while incurring only a marginal BLEU degradation of 0.4β0.8 points. Thus, M-DAT bridges the longstanding trade-off between inference efficiency and translation accuracy in multilingual NMT, enabling scalable, high-fidelity non-autoregressive translation without KD.
π Abstract
Multilingual neural machine translation (MNMT) aims at using one single model for multiple translation directions. Recent work applies non-autoregressive Transformers to improve the efficiency of MNMT, but requires expensive knowledge distillation (KD) processes. To this end, we propose an M-DAT approach to non-autoregressive multilingual machine translation. Our system leverages the recent advance of the directed acyclic Transformer (DAT), which does not require KD. We further propose a pivot back-translation (PivotBT) approach to improve the generalization to unseen translation directions. Experiments show that our M-DAT achieves state-of-the-art performance in non-autoregressive MNMT.