EEG-Bench: A Benchmark for EEG Foundation Models in Clinical Applications

📅 2025-11-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clinical evaluation of EEG foundation models suffers from inconsistent preprocessing, lack of standardized benchmarks, and insufficient validation of generalization under real-world distribution shifts. Method: We propose the first unified evaluation framework tailored to authentic clinical settings, covering 11 neuropsychiatric diagnostic tasks across 14 publicly available datasets. Our protocol enforces minimal preprocessing, standardized data loading and splitting, explicit simulation of cross-site distributional shifts, and multi-task performance normalization. Contribution/Results: This work establishes the first systematic out-of-distribution (OOD) clinical evaluation paradigm for EEG foundation models. Rigorous comparative experiments—spanning classical machine learning and Transformer-based architectures—reveal that while current foundation models excel on specific tasks, they frequently underperform lightweight traditional models under realistic distributional shifts. To foster reproducibility and trustworthiness, we fully open-source all datasets, code, and evaluation tools—advancing standardized, clinically grounded assessment of medical AI.

Technology Category

Application Category

📝 Abstract
We introduce a unified benchmarking framework focused on evaluating EEG-based foundation models in clinical applications. The benchmark spans 11 well-defined diagnostic tasks across 14 publicly available EEG datasets, including epilepsy, schizophrenia, Parkinson's disease, OCD, and mild traumatic brain injury. It features minimal preprocessing, standardized evaluation protocols, and enables side-by-side comparisons of classical baselines and modern foundation models. Our results show that while foundation models achieve strong performance in certain settings, simpler models often remain competitive, particularly under clinical distribution shifts. To facilitate reproducibility and adoption, we release all prepared data and code in an accessible and extensible format.
Problem

Research questions and friction points this paper is trying to address.

Benchmarking EEG foundation models for clinical diagnostic tasks
Evaluating model performance across 11 tasks and 14 public datasets
Comparing foundation models with simpler baselines under distribution shifts
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified benchmarking framework for EEG foundation models
Standardized evaluation across 11 diagnostic tasks and 14 datasets
Minimal preprocessing and accessible data and code release
🔎 Similar Papers
No similar papers found.
Ard Kastrati
Ard Kastrati
ETH Zurich
J
Josua Bürki
ETH Zurich
J
Jonas Lauer
ETH Zurich
C
Cheng Xuan
ETH Zurich
R
Raffaele Iaquinto
ETH Zurich
R
R. Wattenhofer
ETH Zurich