Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding

πŸ“… 2026-06-01
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

231K/year
πŸ€– AI Summary
This study investigates the alignment between internal representations of the speech foundation model Whisper and human cortical neural activity during natural speech perception. We propose a temporally resolved, interpretable neural encoding framework that integrates Whisper embeddings, recurrent temporal modeling, and soft attention mechanisms, and we conduct layer-wise analyses using high-resolution intracranial electrocorticography (ECoG) data. Our results demonstrate that intermediate layers of Whisper exhibit the strongest alignment with cortical responses. Incorporating temporal modeling significantly outperforms linear mapping approaches, while attention maps reveal localized temporal correspondences between speech features and neural responses. Furthermore, electrode-level analyses uncover phoneme-category organization consistent with known anatomical structures, providing novel evidence for the neural computational mechanisms underlying speech processing.
πŸ“ Abstract
Understanding how speech foundation models relate to human cortical activity is a key challenge for computational neuroscience. Here, we investigate how internal representations from Whisper predict intracranial ECoG responses during naturalistic speech perception. We introduce a time-resolved neural encoder that combines speech embeddings with a recurrent temporal model and soft attention, allowing us to examine layer-wise brain alignment. Intermediate Whisper layers provide the strongest correspondence with neural activity, supporting a hierarchical match between model representations and cortical speech processing. Comparisons with baselines show that high-resolution ECoG responses benefit from temporally structured modelling beyond linear mappings from the same speech representations. In addition, attention maps reveal temporally local alignment between speech embeddings and neural responses, while a phonemic interpretability analysis identifies anatomically coherent phoneme-category organization among encoding-informative electrodes. Together, these results suggest that speech foundation models offer a useful framework for studying time-resolved cortical speech representations.
Problem

Research questions and friction points this paper is trying to address.

speech foundation models
neural encoding
ECoG
cortical speech processing
brain alignment
Innovation

Methods, ideas, or system contributions that make the work stand out.

neural encoding
speech foundation models
time-resolved modeling
ECoG
interpretable attention
πŸ”Ž Similar Papers
No similar papers found.