🤖 AI Summary
This study addresses the challenge of accurately reconstructing binary speech–silence sequences from magnetoencephalography (MEG) signals to elucidate the neural mechanisms underlying continuous speech processing. To this end, the authors propose SHINE, a hierarchical multi-scale integration architecture tailored for EEG/MEG time-series data. A novel aspect of this approach is the joint use of speech envelope and Mel-spectrogram as auxiliary supervision signals in a multi-task learning framework—marking the first such application in speech detection tasks. The method integrates deep sequential modeling, auxiliary reconstruction objectives, and model ensembling, incorporating established baselines such as BrainMagic, AWavNet, and ConvConcatNet. Evaluated on the LibriBrain 2025 benchmark, the ensemble model achieves F1-macro scores of 0.9155 and 0.9184 on the standard and extended tracks, respectively, substantially outperforming existing approaches.
📝 Abstract
How natural speech is represented in the brain constitutes a major challenge for cognitive neuroscience, with cortical envelope-following responses playing a central role in speech decoding. This paper presents our approach to the Speech Detection task in the LibriBrain Competition 2025, utilizing over 50 hours of magnetoencephalography (MEG) signals from a single participant listening to LibriVox audiobooks. We introduce the proposed Sequential Hierarchical Integration Network for EEG and MEG (SHINE) to reconstruct the binary speech-silence sequences from MEG signals. In the Extended Track, we further incorporated auxiliary reconstructions of speech envelopes and Mel spectrograms to enhance training. Ensemble methods combining SHINE with baselines (BrainMagic, AWavNet, ConvConcatNet) achieved F1-macro scores of 0.9155 (Standard Track) and 0.9184 (Extended Track) on the leaderboard test set.