OmniEEG-Bench: A Standardized Evaluation Benchmark for EEG Foundation Models

📅 2026-05-30

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the lack of comparability among existing EEG foundation models, which stems from heterogeneous datasets and inconsistent task protocols. To this end, we introduce OmniEEG-Bench, the first unified evaluation benchmark encompassing six task families: signal reliability, biometrics, states of consciousness, cognitive-affective processing, naturalistic stimulus decoding, and motor interaction. The benchmark integrates 54 datasets and standardizes evaluation through uniform task cards and preprocessing pipelines. Systematic assessment of ten representative models reveals that both the diversity of pretraining data and model scale significantly enhance cross-dataset generalization. Notably, our analysis uncovers, for the first time, a scaling law in EEG foundation models, offering empirical guidance for future architectural design and pretraining strategies.

📝 Abstract

Electroencephalography (EEG) supports a variety of brain-computer interface (BCI) tasks ranging from brain-state monitoring to human-LLM interactions. EEG foundation models are emerging, but evaluation remains fragmented due to heterogeneous datasets and nconsistent task protocols. Here, we introduce OmniEEG-Bench, a unified benchmark and downstream task roadmap for EEG foundation models (FMs). It organizes evaluation of EEG FMs into six task families spanning (i) signal reliability, (ii) biometrics and disease, (iii) consciousness and state, (iv) cognition and emotion, (v) naturalistic stimulus decoding, and (vi) motor and interaction, introducing a new generation of tasks not systematically benchmarked in prior EEG FM work. OmniEEG-Bench standardizes model deployment, task definitions, and metrics through a task-card specification, and unifies 54 EEG datasets with consistent evaluation protocols. We benchmark 10 representative EEG foundation models and report a leaderboard that covers diverse evaluation settings. Both pretraining dataset diversity and model size are significantly associated with better average ranks across datasets, revealing scaling-law behavior in EEG foundation models (Figure 1). These results suggest that scaling EEG foundation models requires not only larger architectures but also broader and more diverse pretraining data. The benchmark code is available at https://github.com/ncclab-sustech/omni-eegbench.git.

Problem

Research questions and friction points this paper is trying to address.

EEG foundation models

evaluation benchmark

standardized evaluation

heterogeneous datasets

task protocols

Innovation

Methods, ideas, or system contributions that make the work stand out.

EEG foundation models

standardized benchmark

scaling law