Multilinguality of Large Language Models From a Structural Perspective

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

172K/year

🤖 AI Summary

This study addresses the limited understanding of the intrinsic structural properties of large language models (LLMs) in multilingual processing, particularly the systematic differences between low-resource and high-resource languages such as English. Moving beyond prior work that primarily focuses on token-level representations, this paper pioneers a language-structure-oriented perspective by employing representational structural analysis combined with cross-lingual representation comparison and structural similarity metrics. The findings reveal that low-resource languages exhibit significantly divergent internal structures compared to English within LLMs, and that the degree of structural similarity strongly correlates with language resource availability. Furthermore, language-specific post-training is shown to effectively reshape internal representations while preserving inter-language relationships, thereby uncovering the formative role of post-training in shaping the multilingual structural geometry of LLMs.

📝 Abstract

Large language models (LLMs) have excelled in processing multiple languages through pre- and post-training on multilingual data, even though English dominates the training data. Prior work focusing on token representations has revealed how those LLMs process non-English text. Although these analyses have provided insightful findings, they fail to capture a structural view, which is an inherent property of language. In this study, we explore the multilinguality of LLMs through representational structural analysis. Our findings reveal that low-resource languages are structurally more different from English than high- and mid-resource languages, and that language-specific post-training alters their structures while preserving inter-language relationships.

Problem

Research questions and friction points this paper is trying to address.

multilinguality

large language models

structural analysis

low-resource languages

language representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

structural analysis

multilinguality

large language models