Fine-grained Analysis of Brain-LLM Alignment through Input Attribution

📅 2025-10-14

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study investigates the computational mechanisms underlying alignment between large language models’ (LLMs) linguistic representations and human brain activity, specifically contrasting “brain alignment” (BA) with “next-word prediction” (NWP). Method: We introduce an input-attribution-based, fine-grained, word-level analytical framework that jointly quantifies the differential contributions of individual words to BA and NWP, integrating fMRI/MEG neural responses with cognitive metrics. Contribution/Results: We find that BA relies predominantly on semantic coherence and discourse-level information, whereas NWP is primarily driven by syntactic structure, recency, and primacy effects—indicating distinct lexical dependencies for each task. This work constitutes the first word-level disentanglement of the cognitive foundations of LLM language competence versus neural alignment, offering a novel methodology and empirical evidence for advancing the cognitive interpretability of LLMs and developing neurobiologically plausible language models.

Technology Category

Application Category

📝 Abstract

Understanding the alignment between large language models (LLMs) and human brain activity can reveal computational principles underlying language processing. We introduce a fine-grained input attribution method to identify the specific words most important for brain-LLM alignment, and leverage it to study a contentious research question about brain-LLM alignment: the relationship between brain alignment (BA) and next-word prediction (NWP). Our findings reveal that BA and NWP rely on largely distinct word subsets: NWP exhibits recency and primacy biases with a focus on syntax, while BA prioritizes semantic and discourse-level information with a more targeted recency effect. This work advances our understanding of how LLMs relate to human language processing and highlights differences in feature reliance between BA and NWP. Beyond this study, our attribution method can be broadly applied to explore the cognitive relevance of model predictions in diverse language processing tasks.

Problem

Research questions and friction points this paper is trying to address.

Identifying key words driving brain-LLM alignment using attribution methods

Investigating relationship between brain alignment and next-word prediction

Revealing distinct word subsets for semantic processing versus syntax prediction

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-grained input attribution for brain-LLM alignment

Identifies distinct word subsets for brain alignment versus prediction

Method applicable to cognitive relevance in language tasks

🔎 Similar Papers

No similar papers found.

Authors to Follow