🤖 AI Summary
This work addresses the challenges of weak signal strength, noise sensitivity, and inter-channel variability in electroencephalography (EEG)-to-music reconstruction by proposing a channel-oriented end-to-end framework. The approach integrates channel-level tokenization, multi-view self-distillation, and structured channel dropout within an encode-align-decode architecture to effectively preserve spatially localized neural information and mitigate information loss caused by premature channel mixing. For the first time, it systematically demonstrates the critical role of channel structure in semantic alignment and provides a theoretical analysis of its performance gains. Experimental results show that the proposed method significantly outperforms current state-of-the-art baselines across multiple evaluation metrics, confirming the efficacy of channel-oriented design in enhancing both reconstruction accuracy and robustness.
📝 Abstract
Brain-computer interfaces aim to decode naturalistic stimuli from neural signals, yet most progress to date has focused on vision and language. In this article, we study a more challenging but far less explored setting, EEG-to-music reconstruction, where signals are weak, distributed, and highly susceptible to noise and channel variability. Our central finding is that early channel mixing destroys weak but discriminative EEG signals. To address this, we propose a channel-oriented design with three key components. Specifically, channel-wise tokenization treats each electrode as an explicit token to retain spatially localized neural evidence, channel-wise multi-view self-distillation enforces consistency across temporal crops and random channel subsets to learn robust and distributed representations, and channel-wise data augmentation introduces structured channel dropout to improve invariance to noise, artifacts, and missing electrodes. Together, these components preserve weak yet informative signals across channels and enable stable alignment to a semantic music representation space. We integrate this channel-oriented design within an encoding-alignment-decoding pipeline for EEG-to-music reconstruction. Theoretically, we characterize when preserving channel-level structure leads to improved alignment. Empirically, we compare with a range of state-of-the-art baselines and demonstrate consistent and significant performance gains.