BrainWavLM: Fine-tuning Speech Representations with Brain Responses to Language

๐Ÿ“… 2025-02-13
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing speech encoding models exhibit limited capability in predicting neural responses to speech, and it remains unclear whether brain signals can unsupervisedly enhance semantic representations of speech. Method: We propose BrainWavLMโ€”the first end-to-end brain-encoding framework integrating Low-Rank Adaptation (LoRA) into the WavLM architecture, fine-tuned using fMRI/EEG neural responses as supervision. Unlike conventional linear mapping approaches, BrainWavLM enables simultaneous whole-cortex encoding improvement and selective auditory-cortex fine-tuning. Contribution/Results: Experiments demonstrate substantial gains in cross-subject generalizability and encoding stability. Linear probe analyses confirm that brain-informed representations yield superior robustness on semantic tasks. This work establishes a novel paradigm for brain-inspired speech modeling and unsupervised semantic enhancement, bridging neuroscientific insights with self-supervised speech representation learning.

Technology Category

Application Category

๐Ÿ“ Abstract
Speech encoding models use auditory representations to predict how the human brain responds to spoken language stimuli. Most performant encoding models linearly map the hidden states of artificial neural networks to brain data, but this linear restriction may limit their effectiveness. In this work, we use low-rank adaptation (LoRA) to fine-tune a WavLM-based encoding model end-to-end on a brain encoding objective, producing a model we name BrainWavLM. We show that fine-tuning across all of cortex improves average encoding performance with greater stability than without LoRA. This improvement comes at the expense of low-level regions like auditory cortex (AC), but selectively fine-tuning on these areas improves performance in AC, while largely retaining gains made in the rest of cortex. Fine-tuned models generalized across subjects, indicating that they learned robust brain-like representations of the speech stimuli. Finally, by training linear probes, we showed that the brain data strengthened semantic representations in the speech model without any explicit annotations. Our results demonstrate that brain fine-tuning produces best-in-class speech encoding models, and that non-linear methods have the potential to bridge the gap between artificial and biological representations of semantics.
Problem

Research questions and friction points this paper is trying to address.

Fine-tuning speech models with brain responses
Improving encoding performance across cortex
Enhancing semantic representations without explicit annotations
Innovation

Methods, ideas, or system contributions that make the work stand out.

LoRA fine-tuning WavLM
Non-linear brain encoding
Generalized across subjects
๐Ÿ”Ž Similar Papers
No similar papers found.
N
Nishitha Vattikonda
Department of Computer Science, The University of Texas at Austin
A
Aditya R. Vaidya
Department of Computer Science, The University of Texas at Austin
Richard Antonello
Richard Antonello
Postdoctoral Scholar, Columbia University
Natural Language ProcessingComputational NeuroscienceNeuroscience of Language
A
Alexander G. Huth
Departments of Computer Science and Neuroscience, The University of Texas at Austin