Teaching Wav2Vec2 the Language of the Brain

📅 2025-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address speech decoding for paralyzed individuals, this work pioneers cross-modal knowledge transfer from the audio self-supervised model Wav2Vec2 to electroencephalography (EEG)-based neural signal decoding. Methodologically, we replace Wav2Vec2’s original audio frontend with a learnable Brain Feature Extractor (BFE), yielding an end-to-end sequence-to-sequence model mapping neural signals directly to text. We systematically evaluate three transfer paradigms: full fine-tuning, training from scratch, and freezing the backbone. Our key contributions are: (1) the first empirical validation of Wav2Vec2’s self-supervised representations’ transferability to neural signals; (2) the design of the BFE module for effective modality adaptation; and (3) compelling evidence that pretraining substantially improves performance—full fine-tuning achieves a character error rate (CER) of 18.54%, outperforming training from scratch and frozen-backbone baselines by 20.46 and 15.92 percentage points, respectively, and significantly surpassing prior state-of-the-art.

Technology Category

Application Category

📝 Abstract
The decoding of continuously spoken speech from neuronal activity has the potential to become an important clinical solution for paralyzed patients. Deep Learning Brain Computer Interfaces (BCIs) have recently successfully mapped neuronal activity to text contents in subjects who attempted to formulate speech. However, only small BCI datasets are available. In contrast, labeled data and pre-trained models for the closely related task of speech recognition from audio are widely available. One such model is Wav2Vec2 which has been trained in a self-supervised fashion to create meaningful representations of speech audio data. In this study, we show that patterns learned by Wav2Vec2 are transferable to brain data. Specifically, we replace its audio feature extractor with an untrained Brain Feature Extractor (BFE) model. We then execute full fine-tuning with pre-trained weights for Wav2Vec2, training ''from scratch'' without pre-trained weights as well as freezing a pre-trained Wav2Vec2 and training only the BFE each for 45 different BFE architectures. Across these experiments, the best run is from full fine-tuning with pre-trained weights, achieving a Character Error Rate (CER) of 18.54%, outperforming the best training from scratch run by 20.46% and that of frozen Wav2Vec2 training by 15.92% percentage points. These results indicate that knowledge transfer from audio speech recognition to brain decoding is possible and significantly improves brain decoding performance for the same architectures. Related source code is available at https://github.com/tfiedlerdev/Wav2Vec2ForBrain.
Problem

Research questions and friction points this paper is trying to address.

Wav2Vec2
Brain-Computer Interfaces
Audio Language Understanding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wav2Vec2
Brain Feature Extractor
Transfer Learning
🔎 Similar Papers
No similar papers found.
T
Tobias Fiedler
Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam
L
Leon Hermann
Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam
F
Florian Muller
Hasso Plattner Institute, Digital Engineering Faculty, University of Potsdam
Sarel Cohen
Sarel Cohen
The Academic College of Tel Aviv-Yaffo
Algorithms
P
Peter Chin
Dartmouth College, Hanover, NH 03755
Tobias Friedrich
Tobias Friedrich
Chair for Algorithm Engineering, Hasso Plattner Institute, Potsdam, Germany
Algorithm EngineeringRandomnessArtificial IntelligenceData ScienceNetwork Science
E
E. Vaadia
The Edmond and Lily Safra Center for Brain Sciences at The Hebrew University of Jerusalem, Israel