OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF

career value

233K/year
🤖 AI Summary
This work systematically evaluates the generalizability and clinical applicability of electrocardiogram (ECG) foundation models (ECG-FMs) trained exclusively on public data. To this end, we introduce OpenECG—the first large-scale, open ECG benchmark comprising 1.2 million 12-lead ECG records—and conduct comprehensive evaluations using three self-supervised learning methods—SimCLR, BYOL, and MAE—on ResNet-50 and Vision Transformer (ViT) backbones, under leave-one-center-out validation and data scaling analysis. Key findings are: (1) robust ECG-FMs trained solely on public data achieve performance comparable to, or even surpassing, those trained on proprietary datasets; (2) BYOL and MAE significantly outperform SimCLR in feature consistency and generative fidelity, achieving saturation with only 60–70% of the full dataset; and (3) we provide the first empirical validation that open-data-driven development of clinically viable ECG-FMs is both feasible and effective.

Technology Category

Application Category

📝 Abstract
This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead ECG recordings from nine centers, to evaluate ECG foundation models (ECG-FMs) trained on public datasets. We investigate three self-supervised learning methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer architectures, assessing model generalization through leave-one-dataset-out experiments and data scaling analysis. Results show that pre-training on diverse datasets significantly improves generalization, with BYOL and MAE outperforming SimCLR, highlighting the efficacy of feature-consistency and generative learning over contrastive approaches. Data scaling experiments reveal that performance saturates at 60-70% of total data for BYOL and MAE, while SimCLR requires more data. These findings demonstrate that publicly available ECG data can match or surpass proprietary datasets in training robust ECG-FMs, paving the way for scalable, clinically meaningful AI-driven ECG analysis.
Problem

Research questions and friction points this paper is trying to address.

Evaluating ECG foundation models using 1.2 million public ECG records.
Comparing self-supervised learning methods for ECG model generalization.
Demonstrating public ECG data's efficacy in training robust AI models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning methods for ECG analysis
BYOL and MAE outperform SimCLR in generalization
Performance saturates at 60-70% data for BYOL/MAE
Zhijiang Wan
Zhijiang Wan
Nanchang University
Artificial Intelligence
Q
Qianhao Yu
School of Information Engineering, Nanchang University, Jiangxi, China
J
Jia Mao
School of Information Engineering, Nanchang University, Jiangxi, China
W
Wenfeng Duan
Department of Radiology, The First Affiliated Hospital, Jiangxi Medical College, Nanchang University, Nanchang 330006, China
C
Cheng Ding
Department of Biomedical Engineering, Georgia Institute of Technology, Atlanta 30332-0315, United States