🤖 AI Summary
To address three key challenges in cross-dataset affective EEG recognition—distribution shift, inconsistent affect label definitions across datasets, and high inter-subject variability—this paper proposes a multi-dataset joint pretraining framework. Methodologically, it introduces a novel cross-dataset covariance alignment loss and a hybrid encoder integrating channel-wise Mamba-like linear attention with spatiotemporal dynamic modeling, enabling robust second-order statistical feature alignment and calibration-free recognition. Crucially, the method operates without requiring labeled calibration data from the target domain. Experimental results demonstrate substantial improvements: an average 4.57% gain in AUROC for few-shot emotion recognition and an 11.92% increase in zero-shot transfer accuracy. Moreover, scaling the pretraining dataset consistently enhances performance, achieving up to an 8.55% improvement over single-dataset training baselines.
📝 Abstract
Task-specific pre-training is essential when task representations diverge from generic pre-training features. Existing task-general pre-training EEG models struggle with complex tasks like emotion recognition due to mismatches between task-specific features and broad pre-training approaches. This work aims to develop a task-specific multi-dataset joint pre-training framework for cross-dataset emotion recognition, tackling problems of large inter-dataset distribution shifts, inconsistent emotion category definitions, and substantial inter-subject variability. We introduce a cross-dataset covariance alignment loss to align second-order statistical properties across datasets, enabling robust generalization without the need for extensive labels or per-subject calibration. To capture the long-term dependency and complex dynamics of EEG, we propose a hybrid encoder combining a Mamba-like linear attention channel encoder and a spatiotemporal dynamics model. Our method outperforms state-of-the-art large-scale EEG models by an average of 4.57% in AUROC for few-shot emotion recognition and 11.92% in accuracy for zero-shot generalization to a new dataset. Performance scales with the increase of datasets used in pre-training. Multi-dataset joint pre-training achieves a performance gain of 8.55% over single-dataset training. This work provides a scalable framework for task-specific pre-training and highlights its benefit in generalizable affective computing. Our code is available at https://github.com/ncclab-sustech/mdJPT_nips2025.