Improving semantic understanding in speech language models via brain-tuning

πŸ“… 2024-10-11
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 3
✨ Influential: 1
πŸ“„ PDF
πŸ€– AI Summary
Current speech-language models over-rely on low-level acoustic features and lack alignment with human brain semantic processing mechanisms, limiting their validity as neurocomputational models. To address this, we propose β€œbrain-tuning”—the first method to directly leverage fMRI signals recorded during naturalistic story listening as supervision for fine-tuning speech models (Whisper, Wav2Vec, and SpeechT5). Brain-tuning explicitly enhances neural representational alignment in semantic-sensitive brain regions (e.g., superior temporal gyrus, angular gyrus). By integrating cross-modal alignment with fMRI-driven supervised learning, it systematically improves semantic understanding: downstream task performance increases consistently across benchmarks; latent-space semantic preferences strengthen significantly; and reliance on spurious acoustic cues diminishes. This work establishes a novel paradigm for developing neurobiologically interpretable language models grounded in empirical neural data.

Technology Category

Application Category

πŸ“ Abstract
Speech language models align with human brain responses to natural language to an impressive degree. However, current models rely heavily on low-level speech features, indicating they lack brain-relevant semantics which limits their utility as model organisms of semantic processing in the brain. In this work, we address this limitation by inducing brain-relevant bias directly into the models via fine-tuning with fMRI recordings of people listening to natural stories, a process we name brain-tuning. After testing it on 3 different pretrained model families, we show that brain-tuning not only improves overall alignment with new brain recordings in semantic language regions, but also reduces the reliance on low-level speech features for this alignment. Excitingly, we further show that brain-tuning leads to 1) consistent improvements in performance on a range of downstream tasks and 2) a representational space with increased semantic preference. Our results provide converging evidence, for the first time, that incorporating brain signals into the training of language models improves the models' semantic understanding.
Problem

Research questions and friction points this paper is trying to address.

Enhance semantic understanding in speech language models
Reduce reliance on low-level speech features
Improve alignment with human brain responses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuning models with fMRI recordings
Reducing reliance on low-level speech features
Enhancing semantic understanding via brain-tuning
πŸ”Ž Similar Papers
No similar papers found.