From SALAMANDRA to SALAMANDRATA: BSC Submission for WMT25 General Machine Translation Shared Task

πŸ“… 2025-08-18
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limited generalization capability of multilingual machine translation (MT) across both European and non-European languages. To this end, we propose the SALAMANDRATA series of models, which integrate a novel two-stage continual pretraining strategy with high-quality multilingual instruction fine-tuning. We expand the vocabulary to support 38 European languages and newly introduced non-European languages from WMT25, and enhance inference with minimum Bayes risk decoding and COMET/COMET-KIWI–guided re-ranking. Our contributions include: (1) releasing open-weight models with 2B and 7B parameters, along with an improved SALAMANDRATA-V2; (2) achieving significant performance gains on diverse-directional translation tasks in the WMT25 benchmark; and (3) publicly releasing all models on Hugging Face. This work advances the practical deployment of large-scale, robust multilingual MT systems.

Technology Category

Application Category

πŸ“ Abstract
In this paper, we present the SALAMANDRATA family of models, an improved iteration of SALAMANDRA LLMs (Gonzalez-Agirre et al., 2025) specifically trained to achieve strong performance in translation-related tasks for 38 European languages. SALAMANDRATA comes in two scales: 2B and 7B parameters. For both versions, we applied the same training recipe with a first step of continual pre-training on parallel data, and a second step of supervised fine-tuning on high-quality instructions. The BSC submission to the WMT25 General Machine Translation shared task is based on the 7B variant of SALAMANDRATA. We first adapted the model vocabulary to support the additional non-European languages included in the task. This was followed by a second phase of continual pre-training and supervised fine-tuning, carefully designed to optimize performance across all translation directions for this year's shared task. For decoding, we employed two quality-aware strategies: Minimum Bayes Risk Decoding and Tuned Re-ranking using COMET and COMET-KIWI respectively. We publicly release both the 2B and 7B versions of SALAMANDRATA, along with the newer SALAMANDRATA-V2 model, on Hugging Face1
Problem

Research questions and friction points this paper is trying to address.

Improving machine translation for 38 European languages
Adapting model vocabulary for non-European languages
Optimizing performance with quality-aware decoding strategies
Innovation

Methods, ideas, or system contributions that make the work stand out.

Continual pre-training on parallel data
Supervised fine-tuning on high-quality instructions
Quality-aware decoding with MBR and COMET
πŸ”Ž Similar Papers
No similar papers found.
J
Javier Garcia Gilabert
Barcelona Supercomputing Center
Xixian Liao
Xixian Liao
Barcelona Supercomputing Center
Computational LinguisticsMachine translation
S
Severino Da Dalt
Barcelona Supercomputing Center
E
Ella Bohman
Barcelona Supercomputing Center
A
Audrey Mash
Barcelona Supercomputing Center
F
Francesca De Luca Fornaciari
Barcelona Supercomputing Center
Irene Baucells
Irene Baucells
Barcelona Supercomputing Center
NLP
J
Joan Llop
Barcelona Supercomputing Center
M
Miguel Claramunt Argote
Barcelona Supercomputing Center
Carlos Escolano
Carlos Escolano
Polytechnic University of Catalonia. Barcelona
machine translationmachine learningspeech translation
Maite Melero
Maite Melero
Senior researcher Barcelona Supercomputing Center