🤖 AI Summary
Existing cross-lingual alignment evaluation predominantly relies on sentence embeddings, yet suffers from non-smooth representation spaces—particularly in low-resource languages. To address this, we propose NeuronXA: the first method to adapt the neuroscience-inspired principle of neuron activation overlap for cross-lingual assessment. NeuronXA directly models semantic consistency and cross-lingual transferability by analyzing neuron-level state alignment across parallel sentence pairs in multilingual large language models (e.g., LLaMA, Qwen, Mistral). It requires only 100 parallel sentence pairs for efficient downstream performance prediction. Evaluated on multilingual benchmarks, NeuronXA achieves a Pearson correlation of 0.9556 with downstream task performance and 0.8514 with cross-lingual transferability—substantially improving accuracy and generalizability of alignment evaluation under few-shot conditions.
📝 Abstract
Large language models (LLMs) have demonstrated remarkable multilingual capabilities, however, how to evaluate cross-lingual alignment remains underexplored. Existing alignment benchmarks primarily focus on sentence embeddings, but prior research has shown that neural models tend to induce a non-smooth representation space, which impact of semantic alignment evaluation on low-resource languages. Inspired by neuroscientific findings that similar information activates overlapping neuronal regions, we propose a novel Neuron State-Based Cross-Lingual Alignment (NeuronXA) to assess the cross-lingual a lignment capabilities of LLMs, which offers a more semantically grounded approach to assess cross-lingual alignment. We evaluate NeuronXA on several prominent multilingual LLMs (LLaMA, Qwen, Mistral, GLM, and OLMo) across two transfer tasks and three multilingual benchmarks. The results demonstrate that with only 100 parallel sentence pairs, NeuronXA achieves a Pearson correlation of 0.9556 with downstream tasks performance and 0.8514 with transferability. These findings demonstrate NeuronXA's effectiveness in assessing both cross-lingual alignment and transferability, even with a small dataset. This highlights its potential to advance cross-lingual alignment research and to improve the semantic understanding of multilingual LLMs.