🤖 AI Summary
This study challenges the prevailing assumption that multilingual pretraining inherently improves zero-shot cross-lingual transfer, focusing on word-sense disambiguation and lexical semantic change—two lexically sensitive tasks. Through large-scale empirical analysis across 28 languages, rigorously controlled ablation studies, and newly introduced resources—including the LexShift benchmark and MonoBERT model—we demonstrate that multilinguality per se is neither necessary nor sufficient for improved transfer performance. Rather, evaluation biases and fine-tuning data composition constitute primary confounding factors underlying prior claims of “multilingual advantage.” Results show that carefully designed monolingual or bilingual models match or surpass multilingual counterparts across multiple metrics. All models, datasets, and evaluation tools are open-sourced, establishing a more robust, reproducible cross-lingual semantic evaluation framework—particularly beneficial for low-resource and typologically diverse languages.
📝 Abstract
Cross-lingual transfer allows models to perform tasks in languages unseen during training and is often assumed to benefit from increased multilinguality. In this work, we challenge this assumption in the context of two underexplored, sense-aware tasks: polysemy disambiguation and lexical semantic change. Through a large-scale analysis across 28 languages, we show that multilingual training is neither necessary nor inherently beneficial for effective transfer. Instead, we find that confounding factors - such as fine-tuning data composition and evaluation artifacts - better account for the perceived advantages of multilinguality. Our findings call for more rigorous evaluations in multilingual NLP. We release fine-tuned models and benchmarks to support further research, with implications extending to low-resource and typologically diverse languages.