🤖 AI Summary
How multilingual alignment enhances large language models’ (LLMs) multilingual capabilities remains poorly understood at the neural level.
Method: We propose a fine-grained neuron identification algorithm to jointly detect language-specific and language-invariant neurons; construct a four-stage, neuron-distribution-based multilingual reasoning mechanism—comprising comprehension, semantic reasoning, output-space transformation, and vocabulary generation; and empirically investigate “spontaneous multilingual alignment,” i.e., cross-lingual neuronal response coupling in unaligned models. Using activation analysis, neuron importance scoring, attention visualization, and cross-lingual statistical modeling, we quantify alignment effects on low-resource language representations.
Contribution/Results: Alignment significantly improves activation specificity of low-resource language–associated neurons; the four-stage mechanism is validated by neural activity patterns; and we establish the first interpretable, neuron-level mechanistic model of multilingual reasoning while discovering that alignment exhibits intrinsic emergent properties—even without explicit training objectives.
📝 Abstract
Multilingual Alignment is an effective and representative paradigm to enhance LLMs' multilingual capabilities, which transfers the capabilities from the high-resource languages to the low-resource languages. Meanwhile, some researches on language-specific neurons reveal that there are language-specific neurons that are selectively activated in LLMs when processing different languages. This provides a new perspective to analyze and understand LLMs' mechanisms more specifically in multilingual scenarios. In this work, we propose a new finer-grained neuron identification algorithm, which detects language neurons~(including language-specific neurons and language-related neurons) and language-agnostic neurons. Furthermore, based on the distributional characteristics of different types of neurons, we divide the LLMs' internal process for multilingual inference into four parts: (1) multilingual understanding, (2) shared semantic space reasoning, (3) multilingual output space transformation, and (4) vocabulary space outputting. Additionally, we systematically analyze the models before and after alignment with a focus on different types of neurons. We also analyze the phenomenon of ''Spontaneous Multilingual Alignment''. Overall, our work conducts a comprehensive investigation based on different types of neurons, providing empirical results and valuable insights for better understanding multilingual alignment and multilingual capabilities of LLMs.