Tower+: Bridging Generality and Translation Specialization in Multilingual LLMs

📅 2025-06-20
📈 Citations: 0
✹ Influential: 0
📄 PDF
đŸ€– AI Summary
This study addresses the fundamental trade-off between task specialization (e.g., machine translation) and general-purpose capabilities (e.g., dialogue, reasoning, instruction following) in multilingual large language models. We propose a Pareto-optimal multi-stage training paradigm: continued pretraining → supervised fine-tuning → preference optimization → verifiable-reward reinforcement learning, integrated with multi-task data generation and rigorous filtering. We develop a family of multilingual models at three scales—2B, 9B, and 72B parameters—and introduce IF-MT, the first dedicated benchmark for instruction-following machine translation evaluation. Experimental results demonstrate that our 2B and 9B models outperform Llama 3.3-70B; the 72B model achieves state-of-the-art performance on high-resource language translation, Multilingual Arena Hard, and IF-MT—marking the first instance where translation specialization and broad general capabilities are simultaneously and synergistically enhanced.

Technology Category

Application Category

📝 Abstract
Fine-tuning pretrained LLMs has been shown to be an effective strategy for reaching state-of-the-art performance on specific tasks like machine translation. However, this process of adaptation often implies sacrificing general-purpose capabilities, such as conversational reasoning and instruction-following, hampering the utility of the system in real-world applications that require a mixture of skills. In this paper, we introduce Tower+, a suite of models designed to deliver strong performance across both translation and multilingual general-purpose text capabilities. We achieve a Pareto frontier between translation specialization and multilingual general-purpose capabilities by introducing a novel training recipe that builds on Tower (Alves et al., 2024), comprising continued pretraining, supervised fine-tuning, preference optimization, and reinforcement learning with verifiable rewards. At each stage of training, we carefully generate and curate data to strengthen performance on translation as well as general-purpose tasks involving code generation, mathematics problem solving, and general instruction-following. We develop models at multiple scales: 2B, 9B, and 72B. Our smaller models often outperform larger general-purpose open-weight and proprietary LLMs (e.g., Llama 3.3 70B, GPT-4o). Our largest model delivers best-in-class translation performance for high-resource languages and top results in multilingual Arena Hard evaluations and in IF-MT, a benchmark we introduce for evaluating both translation and instruction-following. Our findings highlight that it is possible to rival frontier models in general capabilities, while optimizing for specific business domains, such as translation and localization.
Problem

Research questions and friction points this paper is trying to address.

Balancing translation specialization with general multilingual capabilities
Avoiding performance loss in general tasks during fine-tuning
Achieving top translation and multilingual task performance simultaneously
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines continued pretraining with supervised fine-tuning
Uses preference optimization and reinforcement learning
Generates curated data for translation and general tasks
🔎 Similar Papers
No similar papers found.
Ricardo Rei
Ricardo Rei
Sword Health
Healthcare AIMachine LearningNatural Language ProcessingLarge Language Models
N
Nuno M. Guerreiro
Unbabel, Instituto de TelecomunicaçÔes, Instituto Superior Técnico & Universidade de Lisboa (Lisbon ELLIS Unit), MICS, CentraleSupélec, Université Paris-Saclay
José Pombal
José Pombal
Sword Health
J
JoĂŁo Alves
Unbabel, Instituto de TelecomunicaçÔes, Instituto Superior Técnico & Universidade de Lisboa (Lisbon ELLIS Unit)
P
Pedro Teixeirinha
Unbabel
Amin Farajian
Amin Farajian
Unbabel
Natural Language ProcessingMachine Translation
A
André F. T. Martins
Unbabel, Instituto de TelecomunicaçÔes, Instituto Superior Técnico & Universidade de Lisboa (Lisbon ELLIS Unit)