The Effect of Language Diversity When Fine-Tuning Large Language Models for Translation

📅 2025-05-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The impact of language diversity on fine-tuning large language models (LLMs) for machine translation remains contested. Method: We conduct controlled experiments across 132 translation directions to systematically investigate how varying language diversity during supervised and unsupervised fine-tuning affects translation performance and cross-lingual representation learning. Contribution/Results: We provide the first empirical evidence that moderate increases in language diversity significantly improve both supervised and unsupervised translation quality while enhancing language-agnostic representation learning; however, performance gains saturate—and eventually degrade—beyond an optimal diversity threshold. Crucially, we demonstrate that this effect stems from diversity-induced generalization and disentanglement in cross-lingual representations, rather than mere data augmentation. Our findings yield quantifiable, theoretically grounded principles for configuring language diversity in multilingual LLM fine-tuning.

Technology Category

Application Category

📝 Abstract
Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and -- surprisingly -- supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.
Problem

Research questions and friction points this paper is trying to address.

Impact of language diversity on LLM fine-tuning for translation
Resolution of conflicting prior research on language diversity benefits
Optimal language diversity threshold for translation quality improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Expanding language diversity improves translation quality
Benefits plateau beyond certain diversity threshold
Diverse fine-tuning creates language-agnostic representations
🔎 Similar Papers
No similar papers found.
David Stap
David Stap
NXAI
Machine TranslationMachine LearningNatural Language Processing
C
C. Monz
Language Technology Lab, University of Amsterdam