🤖 AI Summary
This work investigates whether large language models (LLMs) exhibit sensitivity to linguistic genealogy—specifically, whether they preferentially switch to genetically related languages under inconsistent prompting and whether factual knowledge exhibits higher cross-lingual consistency within, rather than across, language families.
Method: Leveraging an extended MultiQ dataset, we propose novel metrics for detecting language-switching patterns and quantifying cross-lingual knowledge consistency, conducting a comparative analysis across multiple state-of-the-art multilingual LLMs.
Contribution/Results: We find that LLMs普遍 demonstrate genealogical sensitivity, but its strength is strongly modulated by the distribution of language-family resources in training data. Distinct model families adopt divergent multilingual strategies. Crucially, both linguistic fidelity and knowledge consistency are significantly higher among intra-family language pairs than inter-family ones. This study provides the first systematic empirical validation of genealogical-level effects in LLM multilingual behavior, revealing their data-driven origin and offering new insights into the structure of linguistic representations in foundation models.
📝 Abstract
Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this paper, we investigate whether LLMs exhibit sensitivity to linguistic genera by extending prior analyses on the MultiQ dataset. We first check if models prefer to switch to genealogically related languages when prompt language fidelity is not maintained. Next, we investigate whether knowledge consistency is better preserved within than across genera. We show that genus-level effects are present but strongly conditioned by training resource availability. We further observe distinct multilingual strategies across LLMs families. Our findings suggest that LLMs encode aspects of genus-level structure, but training data imbalances remain the primary factor shaping their multilingual performance.