🤖 AI Summary
The impact of language diversity on fine-tuning large language models (LLMs) for machine translation remains contested. Method: We conduct controlled experiments across 132 translation directions to systematically investigate how varying language diversity during supervised and unsupervised fine-tuning affects translation performance and cross-lingual representation learning. Contribution/Results: We provide the first empirical evidence that moderate increases in language diversity significantly improve both supervised and unsupervised translation quality while enhancing language-agnostic representation learning; however, performance gains saturate—and eventually degrade—beyond an optimal diversity threshold. Crucially, we demonstrate that this effect stems from diversity-induced generalization and disentanglement in cross-lingual representations, rather than mere data augmentation. Our findings yield quantifiable, theoretically grounded principles for configuring language diversity in multilingual LLM fine-tuning.
📝 Abstract
Prior research diverges on language diversity in LLM fine-tuning: Some studies report benefits while others find no advantages. Through controlled fine-tuning experiments across 132 translation directions, we systematically resolve these disparities. We find that expanding language diversity during fine-tuning improves translation quality for both unsupervised and -- surprisingly -- supervised pairs, despite less diverse models being fine-tuned exclusively on these supervised pairs. However, benefits plateau or decrease beyond a certain diversity threshold. We show that increased language diversity creates more language-agnostic representations. These representational adaptations help explain the improved performance in models fine-tuned with greater diversity.