🤖 AI Summary
This study investigates whether large language models (LLMs) can uncover the neural representational mechanisms of emotional valence in human electroencephalography (EEG) and overcome the limitations of existing alignment methods that fail to improve decoding performance. By constructing a valence axis (V-axis) in LLMs using only nine emotional sentences, the authors demonstrate its consistency across multiple models, zero-shot transferability, and strong alignment with EEG activity. They propose “saturation regularity” as an explanation for the failure of conventional alignment strategies and instead leverage residual subspace diversity for ensemble learning, achieving a 10.5% improvement in balanced accuracy on the FACED and SEED-V datasets. Furthermore, 36 EEG classifiers never exposed to the V-axis spontaneously reconstruct the same valence direction, while most of 25 mainstream alignment approaches prove ineffective or even detrimental.
📝 Abstract
Large language models (LLMs) have emerged as powerful representation learners whose internal features increasingly align with human cognition. We study whether modern LLMs can serve as a lens for understanding neural representations in the human brain, focusing on emotional valence in EEG.
We first build a one-dimensional valence direction, the V-axis, from modern LLMs using only nine emotion-evocative sentences. We validate it through zero-shot transfer to sentiment benchmarks and cross-model consistency across fourteen LLMs. We then show that this LLM-derived direction maps onto human neural activity. On a public EEG cohort of 123 subjects watching affective videos, a single linear projection on EEG features tracks the V-axis position of each stimulus. Moreover, 36 EEG emotion classifiers trained without exposure to the V-axis spontaneously rediscover the same direction in their internal representations, suggesting that the same valence structure emerges in both language models and human electrophysiology.
Yet this convergence does not provide an effective training signal. We test twenty-five alignment strategies, including knowledge distillation, representational similarity, contrastive, and topographic losses; none improve decoding, and sixteen significantly reduce accuracy. We formalize this result as the saturation regularity: once task labels alone drive a brain-decoding network onto the target direction, additional supervision mainly distorts an already-saturated basin, while the load-bearing within-class residual receives little useful gradient.
This regularity also indicates where improvement should come from: the residual subspace unreachable by supervision. Motivated by this insight, we ensemble across residual diversity rather than supervising the basin, improving balanced accuracy by 10.5% over the prior best on FACED, with the same effect replicated on SEED-V.