🤖 AI Summary
To address weak generalization in EEG-based affective computing caused by cross-dataset label heterogeneity and semantic inconsistency, this paper proposes a valence-arousal (VA)-guided contrastive learning framework. Methodologically, we design a three-domain encoder coupled with a spatiotemporal Transformer backbone to achieve semantic alignment and structural normalization of multi-source EEG signals within a unified VA embedding space; additionally, a soft-weighted supervised contrastive loss is introduced to enhance the discriminability and transferability of emotion representations. The model is pre-trained on eight public EEG datasets and evaluated on three cross-dataset benchmarks, achieving state-of-the-art performance across all settings—demonstrating substantial improvements in adaptability to unseen scenarios and classification accuracy. Our core contribution lies in the first deep integration of VA-space modeling with contrastive learning, establishing the inaugural unified representation paradigm for cross-dataset generalization in EEG-based emotion recognition.
📝 Abstract
Emotion recognition from EEG signals is essential for affective computing and has been widely explored using deep learning. While recent deep learning approaches have achieved strong performance on single EEG emotion datasets, their generalization across datasets remains limited due to the heterogeneity in annotation schemes and data formats. Existing models typically require dataset-specific architectures tailored to input structure and lack semantic alignment across diverse emotion labels. To address these challenges, we propose EMOD: A Unified EEG Emotion Representation Framework Leveraging Valence-Arousal (V-A) Guided Contrastive Learning. EMOD learns transferable and emotion-aware representations from heterogeneous datasets by bridging both semantic and structural gaps. Specifically, we project discrete and continuous emotion labels into a unified V-A space and formulate a soft-weighted supervised contrastive loss that encourages emotionally similar samples to cluster in the latent space. To accommodate variable EEG formats, EMOD employs a flexible backbone comprising a Triple-Domain Encoder followed by a Spatial-Temporal Transformer, enabling robust extraction and integration of temporal, spectral, and spatial features. We pretrain EMOD on 8 public EEG datasets and evaluate its performance on three benchmark datasets. Experimental results show that EMOD achieves the state-of-the-art performance, demonstrating strong adaptability and generalization across diverse EEG-based emotion recognition scenarios.