Are the LLMs Capable of Maintaining at Least the Language Genus?

📅 2025-10-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether large language models (LLMs) exhibit sensitivity to linguistic genealogy—specifically, whether they preferentially switch to genetically related languages under inconsistent prompting and whether factual knowledge exhibits higher cross-lingual consistency within, rather than across, language families. Method: Leveraging an extended MultiQ dataset, we propose novel metrics for detecting language-switching patterns and quantifying cross-lingual knowledge consistency, conducting a comparative analysis across multiple state-of-the-art multilingual LLMs. Contribution/Results: We find that LLMs普遍 demonstrate genealogical sensitivity, but its strength is strongly modulated by the distribution of language-family resources in training data. Distinct model families adopt divergent multilingual strategies. Crucially, both linguistic fidelity and knowledge consistency are significantly higher among intra-family language pairs than inter-family ones. This study provides the first systematic empirical validation of genealogical-level effects in LLM multilingual behavior, revealing their data-driven origin and offering new insights into the structure of linguistic representations in foundation models.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) display notable variation in multilingual behavior, yet the role of genealogical language structure in shaping this variation remains underexplored. In this paper, we investigate whether LLMs exhibit sensitivity to linguistic genera by extending prior analyses on the MultiQ dataset. We first check if models prefer to switch to genealogically related languages when prompt language fidelity is not maintained. Next, we investigate whether knowledge consistency is better preserved within than across genera. We show that genus-level effects are present but strongly conditioned by training resource availability. We further observe distinct multilingual strategies across LLMs families. Our findings suggest that LLMs encode aspects of genus-level structure, but training data imbalances remain the primary factor shaping their multilingual performance.
Problem

Research questions and friction points this paper is trying to address.

Investigating LLM sensitivity to genealogical language structure
Examining language switching preferences to related languages
Assessing knowledge consistency preservation within language genera
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extending MultiQ dataset analyses for linguistic genera
Testing language switching preferences within related genera
Evaluating knowledge consistency preservation across genera
🔎 Similar Papers
No similar papers found.