OpenECG: Benchmarking ECG Foundation Models with Public 1.2 Million Records

📅 2025-03-02

📈 Citations: 0

✨ Influential: 0

career value

233K/year

🤖 AI Summary

This work systematically evaluates the generalizability and clinical applicability of electrocardiogram (ECG) foundation models (ECG-FMs) trained exclusively on public data. To this end, we introduce OpenECG—the first large-scale, open ECG benchmark comprising 1.2 million 12-lead ECG records—and conduct comprehensive evaluations using three self-supervised learning methods—SimCLR, BYOL, and MAE—on ResNet-50 and Vision Transformer (ViT) backbones, under leave-one-center-out validation and data scaling analysis. Key findings are: (1) robust ECG-FMs trained solely on public data achieve performance comparable to, or even surpassing, those trained on proprietary datasets; (2) BYOL and MAE significantly outperform SimCLR in feature consistency and generative fidelity, achieving saturation with only 60–70% of the full dataset; and (3) we provide the first empirical validation that open-data-driven development of clinically viable ECG-FMs is both feasible and effective.

Technology Category

Application Category

📝 Abstract

This study introduces OpenECG, a large-scale benchmark of 1.2 million 12-lead ECG recordings from nine centers, to evaluate ECG foundation models (ECG-FMs) trained on public datasets. We investigate three self-supervised learning methods (SimCLR, BYOL, MAE) with ResNet-50 and Vision Transformer architectures, assessing model generalization through leave-one-dataset-out experiments and data scaling analysis. Results show that pre-training on diverse datasets significantly improves generalization, with BYOL and MAE outperforming SimCLR, highlighting the efficacy of feature-consistency and generative learning over contrastive approaches. Data scaling experiments reveal that performance saturates at 60-70% of total data for BYOL and MAE, while SimCLR requires more data. These findings demonstrate that publicly available ECG data can match or surpass proprietary datasets in training robust ECG-FMs, paving the way for scalable, clinically meaningful AI-driven ECG analysis.

Problem

Research questions and friction points this paper is trying to address.

Evaluating ECG foundation models using 1.2 million public ECG records.

Comparing self-supervised learning methods for ECG model generalization.

Demonstrating public ECG data's efficacy in training robust AI models.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning methods for ECG analysis

BYOL and MAE outperform SimCLR in generalization

Performance saturates at 60-70% data for BYOL/MAE

🔎 Similar Papers

ECG-FM: An Open Electrocardiogram Foundation Model