🤖 AI Summary
ECG arrhythmia classification research suffers from inconsistent evaluation protocols, neglect of embedded-deployment constraints (e.g., latency, energy consumption, memory footprint), and insufficient clinical validation—hindering model comparability and real-world deployability. To address this, we propose E3C, a three-dimensional evaluation framework encompassing cross-patient validation, AAMI-compliance, and resource-constrained deployment. We conduct a systematic review of 2017–2024 literature, uniquely integrating inference efficiency, multi-center generalizability, and clinical standards into a unified assessment. Our analysis reveals that only a handful of state-of-the-art methods satisfy all E3C criteria; accordingly, we establish a standardized reporting guideline. E3C shifts the paradigm in ECG classification from accuracy-centric evaluation toward balanced emphasis on clinical reliability and edge-deployment feasibility, thereby substantially improving reproducibility, cross-study comparability, and translational potential.
📝 Abstract
The classification of electrocardiogram (ECG) signals is crucial for early detection of arrhythmias and other cardiac conditions. However, despite advances in machine learning, many studies fail to follow standardization protocols, leading to inconsistencies in performance evaluation and real-world applicability. Additionally, hardware constraints essential for practical deployment, such as in pacemakers, Holter monitors, and wearable ECG patches, are often overlooked. Since real-world impact depends on feasibility in resource-constrained devices, ensuring efficient deployment is critical for continuous monitoring. This review systematically analyzes ECG classification studies published between 2017 and 2024, focusing on those adhering to the E3C (Embedded, Clinical, and Comparative Criteria), which include inter-patient paradigm implementation, compliance with Association for the Advancement of Medical Instrumentation (AAMI) recommendations, and model feasibility for embedded systems. While many studies report high accuracy, few properly consider patient-independent partitioning and hardware limitations. We identify state-of-the-art methods meeting E3C criteria and conduct a comparative analysis of accuracy, inference time, energy consumption, and memory usage. Finally, we propose standardized reporting practices to ensure fair comparisons and practical applicability of ECG classification models. By addressing these gaps, this study aims to guide future research toward more robust and clinically viable ECG classification systems.