Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter

📅 2025-01-24

📈 Citations: 0

✨ Influential: 0

career value

186K/year

🤖 AI Summary

This study investigates the mechanistic role of linguistic similarity in cross-lingual transfer, addressing the critical challenge of optimal source-language selection under resource constraints. We systematically evaluate zero-shot and fine-tuned transfer performance of mBERT and XLM-R across 266 typologically diverse languages on part-of-speech tagging, dependency parsing, and topic classification. Leveraging multidimensional language distance metrics—including genealogical, phonological, morphological, and treebank-based measures—we conduct attribution analysis to quantify similarity’s predictive power. Our key finding is that linguistic similarity exhibits strong conditional dependence: its efficacy is significantly moderated by task type (syntactic vs. semantic), model input representation, and the specific definition of similarity. These results fundamentally challenge the oversimplified “similarity implies benefit” assumption, demonstrating that similarity is neither universally predictive nor invariant across tasks or models. The work thus provides a theoretically grounded, task-adaptive framework for principled cross-lingual data selection in low-resource settings.

Technology Category

Application Category

📝 Abstract

Cross-lingual transfer is a popular approach to increase the amount of training data for NLP tasks in a low-resource context. However, the best strategy to decide which cross-lingual data to include is unclear. Prior research often focuses on a small set of languages from a few language families and/or a single task. It is still an open question how these findings extend to a wider variety of languages and tasks. In this work, we analyze cross-lingual transfer for 266 languages from a wide variety of language families. Moreover, we include three popular NLP tasks: POS tagging, dependency parsing, and topic classification. Our findings indicate that the effect of linguistic similarity on transfer performance depends on a range of factors: the NLP task, the (mono- or multilingual) input representations, and the definition of linguistic similarity.

Problem

Research questions and friction points this paper is trying to address.

Cross-lingual Data Selection

Natural Language Processing

Language Resource Limitation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-lingual Learning

Natural Language Processing

Language Similarity

🔎 Similar Papers

No similar papers found.