PRISM: Exploring Heterogeneous Pretrained EEG Foundation Model Transfer to Clinical Differential Diagnosis

📅 2026-02-28

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the limited generalizability of existing EEG foundation models, which are typically pretrained on single-source clinical data and struggle to disentangle neurophysiological signals from device- or site-specific artifacts. To overcome this, the authors construct a multi-center EEG pretraining corpus with deliberate geographic and device diversity and employ a unified masked autoencoder (MAE) architecture with consistent preprocessing. They systematically investigate how data heterogeneity influences transfer performance on downstream clinical tasks, revealing a critical trade-off between data diversity and model transferability. Their findings demonstrate that strategically curated diverse data outperforms indiscriminate scale expansion, achieving a 12.3-percentage-point gain in balanced accuracy on an unseen seizure-versus-mimics discrimination task—matching or surpassing the REVE model trained on 92 datasets—and identify six key non-additive bias factors that critically impact evaluation.

Technology Category

Application Category

📝 Abstract

EEG foundation models are typically pretrained on narrow-source clinical archives and evaluated on benchmarks from the same ecosystem, leaving unclear whether representations encode neural physiology or recording-distribution artifacts. We introduce PRISM (Population Representative Invariant Signal Model), a masked autoencoder ablated along two axes -- pretraining population and downstream adaptation -- with architecture and preprocessing fixed. We compare a narrow-source EU/US corpus (TUH + PhysioNet) against a geographically diverse pool augmented with multi-center South Asian clinical recordings across multiple EEG systems. Three findings emerge. First, narrow-source pretraining yields stronger linear probes on distribution-matched benchmarks, while diverse pretraining produces more adaptable representations under fine-tuning -- a trade-off invisible under single-protocol evaluation. Trained on three source corpora, PRISM matches or outperforms REVE (92 datasets, 60,000+ hours) on the majority of tasks, demonstrating that targeted diversity can substitute for indiscriminate scale and that dataset count is a confounding variable in model comparison. Second, on a clinically challenging and previously untested task -- distinguishing epilepsy from diagnostic mimickers via interictal EEG -- the diverse checkpoint outperforms the narrow-source checkpoint by +12.3 pp balanced accuracy, the largest gap across all evaluations. Third, systematic inconsistencies between EEG-Bench and EEG-FM-Bench reverse model rankings on identical datasets by up to 24 pp; we identify six concrete sources including split construction, checkpoint selection, segment length, and normalization, showing these factors compound non-additively.

Problem

Research questions and friction points this paper is trying to address.

EEG foundation model

clinical differential diagnosis

distribution shift

benchmark inconsistency

epilepsy mimickers

Innovation

Methods, ideas, or system contributions that make the work stand out.

EEG foundation model

pretraining diversity

clinical transfer learning

benchmark inconsistency

masked autoencoder

🔎 Similar Papers

No similar papers found.

Authors to Follow