Looking Beyond Accuracy: A Holistic Benchmark of ECG Foundation Models

📅 2026-01-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current evaluations of ECG foundation models rely excessively on downstream task accuracy, which inadequately captures their representational generalization capabilities and limits clinical reliability. To address this, this work proposes the first comprehensive evaluation framework that integrates representation-level analysis, systematically incorporating SHAP for feature importance interpretation and UMAP for visualizing embedding structures to establish a multidimensional benchmark beyond accuracy alone. Extensive validation across multiple pretrained models and real-world scenarios—including cross-continental datasets and data-scarce settings—demonstrates that the framework effectively uncovers both similarities and differences in embedding patterns across models, substantially advancing the understanding of generalization mechanisms in ECG foundation models.

Technology Category

Application Category

📝 Abstract
The electrocardiogram (ECG) is a cost-effective, highly accessible and widely employed diagnostic tool. With the advent of Foundation Models (FMs), the field of AI-assisted ECG interpretation has begun to evolve, as they enable model reuse across different tasks by relying on embeddings. However, to responsibly employ FMs, it is crucial to rigorously assess to which extent the embeddings they produce are generalizable, particularly in error-sensitive domains such as healthcare. Although prior works have already addressed the problem of benchmarking ECG-expert FMs, they focus predominantly on the evaluation of downstream performance. To fill this gap, this study aims to find an in-depth, comprehensive benchmarking framework for FMs, with a specific focus on ECG-expert ones. To this aim, we introduce a benchmark methodology that complements performance-based evaluation with representation-level analysis, leveraging SHAP and UMAP techniques. Furthermore, we rely on the methodology for carrying out an extensive evaluation of several ECG-expert FMs pretrained via state-of-the-art techniques over different cross-continental datasets and data availability settings; this includes ones featuring data scarcity, a fairly common situation in real-world medical scenarios. Experimental results show that our benchmarking protocol provides a rich insight of ECG-expert FMs'embedded patterns, enabling a deeper understanding of their representational structure and generalizability.
Problem

Research questions and friction points this paper is trying to address.

ECG Foundation Models
benchmarking
generalizability
representation analysis
healthcare AI
Innovation

Methods, ideas, or system contributions that make the work stand out.

ECG Foundation Models
Holistic Benchmarking
Representation Analysis
SHAP
UMAP
🔎 Similar Papers
No similar papers found.
F
Francesca Filice
Department of Mathematics and Computer Science, University of Calabria, Italy
E
Edoardo De Rose
Department of Mathematics and Computer Science, University of Calabria, Italy
S
Simone Bartucci
Department of Mathematics and Computer Science, University of Calabria, Italy; Department of Computer, Control and Management Engineering “Antonio Ruberti”, Sapienza University of Rome, Italy; DLVSystem Srl, Rende, Italy
Francesco Calimeri
Francesco Calimeri
Full Professor of Computer Science, University of Calabria (UNICAL), Italy
Artificial IntelligenceKRRLogic ProgrammingAnswer Set ProgrrammingAI and Medicine
S
Simona Perri
Department of Mathematics and Computer Science, University of Calabria, Italy