🤖 AI Summary
This work investigates cross-modal transfer of pretrained neural audio codecs (DACs) to electroencephalogram (EEG) compression. To address fundamental disparities between EEG and audio—namely in sampling rate, channel topology, and signal scale—we propose DAC-MC, a multi-channel extension architecture: (i) stride-based framing adapts raw EEG inputs while preserving pretrained audio weights; (ii) attention-driven cross-channel aggregation and channel-specific decoding model spatial dependencies among electrodes; and (iii) optimized residual codebook depth, codebook size, and input sampling rate enhance representational fidelity. Evaluated on the TUH Abnormal and Epilepsy datasets, DAC-MC significantly outperforms from-scratch baselines: it achieves high signal reconstruction fidelity, low spectrogram distortion, and stable downstream classification accuracy. Results demonstrate that audio-pretrained models are not only feasible but superior for efficient, clinically informative EEG compression.
📝 Abstract
EEG and audio are inherently distinct modalities, differing in sampling rate, channel structure, and scale. Yet, we show that pretrained neural audio codecs can serve as effective starting points for EEG compression, provided that the data are preprocessed to be suitable to the codec's input constraints. Using DAC, a state-of-the-art neural audio codec as our base, we demonstrate that raw EEG can be mapped into the codec's stride-based framing, enabling direct reuse of the audio-pretrained encoder-decoder. Even without modification, this setup yields stable EEG reconstructions, and fine-tuning on EEG data further improves fidelity and generalization compared to training from scratch. We systematically explore compression-quality trade-offs by varying residual codebook depth, codebook (vocabulary) size, and input sampling rate. To capture spatial dependencies across electrodes, we propose DAC-MC, a multi-channel extension with attention-based cross-channel aggregation and channel-specific decoding, while retaining the audio-pretrained initialization. Evaluations on the TUH Abnormal and Epilepsy datasets show that the adapted codecs preserve clinically relevant information, as reflected in spectrogram-based reconstruction loss and downstream classification accuracy.