Brain-tuned Speech Models Better Reflect Speech Processing Stages in the Brain

📅 2025-06-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
State-of-the-art pretrained speech models exhibit strong performance but display a representational hierarchy misaligned with human auditory processing: mid-layer representations are semantically rich, whereas late layers lack semantic specificity. Method: We propose a cross-modal fine-tuning framework guided by fMRI and EEG neural signals, the first to align self-supervised speech models (e.g., wav2vec 2.0) with intermediate stages of human cortical speech processing. Layer-wise probing and Representational Similarity Analysis (RSA) are employed for validation. Results: Post-neurofeedback fine-tuning significantly enhances functional specialization across layers: early layers specialize in acoustic features, while late layers robustly encode high-level semantics—establishing a clear “acoustic → semantic” hierarchical progression. Crucially, late-layer representations show markedly improved functional alignment with canonical semantic brain regions. Beyond performance gains, our approach yields a biologically plausible, interpretable, and empirically verifiable speech processing architecture, advancing computational neuroscience with a novel, neuroscientifically grounded modeling paradigm.

Technology Category

Application Category

📝 Abstract
Pretrained self-supervised speech models excel in speech tasks but do not reflect the hierarchy of human speech processing, as they encode rich semantics in middle layers and poor semantics in late layers. Recent work showed that brain-tuning (fine-tuning models using human brain recordings) improves speech models' semantic understanding. Here, we examine how well brain-tuned models further reflect the brain's intermediate stages of speech processing. We find that late layers of brain-tuned models substantially improve over pretrained models in their alignment with semantic language regions. Further layer-wise probing reveals that early layers remain dedicated to low-level acoustic features, while late layers become the best at complex high-level tasks. These findings show that brain-tuned models not only perform better but also exhibit a well-defined hierarchical processing going from acoustic to semantic representations, making them better model organisms for human speech processing.
Problem

Research questions and friction points this paper is trying to address.

Brain-tuned speech models align better with brain's semantic processing stages
Late layers of brain-tuned models improve semantic alignment over pretrained models
Brain-tuning creates hierarchical processing from acoustic to semantic representations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Brain-tuning aligns models with brain processing stages
Late layers improve semantic alignment after brain-tuning
Hierarchical processing from acoustic to semantic features
🔎 Similar Papers
No similar papers found.