Multilinguality of Large Language Models From a Structural Perspective

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

172K/year
🤖 AI Summary
This study addresses the limited understanding of the intrinsic structural properties of large language models (LLMs) in multilingual processing, particularly the systematic differences between low-resource and high-resource languages such as English. Moving beyond prior work that primarily focuses on token-level representations, this paper pioneers a language-structure-oriented perspective by employing representational structural analysis combined with cross-lingual representation comparison and structural similarity metrics. The findings reveal that low-resource languages exhibit significantly divergent internal structures compared to English within LLMs, and that the degree of structural similarity strongly correlates with language resource availability. Furthermore, language-specific post-training is shown to effectively reshape internal representations while preserving inter-language relationships, thereby uncovering the formative role of post-training in shaping the multilingual structural geometry of LLMs.
📝 Abstract
Large language models (LLMs) have excelled in processing multiple languages through pre- and post-training on multilingual data, even though English dominates the training data. Prior work focusing on token representations has revealed how those LLMs process non-English text. Although these analyses have provided insightful findings, they fail to capture a structural view, which is an inherent property of language. In this study, we explore the multilinguality of LLMs through representational structural analysis. Our findings reveal that low-resource languages are structurally more different from English than high- and mid-resource languages, and that language-specific post-training alters their structures while preserving inter-language relationships.
Problem

Research questions and friction points this paper is trying to address.

multilinguality
large language models
structural analysis
low-resource languages
language representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

structural analysis
multilinguality
large language models
low-resource languages
post-training
🔎 Similar Papers
No similar papers found.