🤖 AI Summary
Existing ECG multimodal large language models (MLLMs) are limited to single-task (report generation) operation with fixed short-duration 12-lead inputs and lack diverse, clinically representative evaluation benchmarks.
Method: We propose anyECG—the first general-purpose ECG MLLM—supporting multi-task capabilities (report generation, abnormal waveform localization, open-ended question answering) and flexible input formats (single/multiple ECGs; short/long-duration; full/reduced-lead). We introduce a dynamic-length ECG encoder and a multi-ECG joint understanding architecture, construct the first clinical-scenario-covering anyECG dataset, and employ a three-stage curriculum learning strategy with adaptive visual encoding and cross-modal alignment fine-tuning.
Contribution/Results: Experiments demonstrate significant superiority over state-of-the-art methods in report generation, home-based long-term ECG abnormality localization, and multi-ECG comparative analysis—achieving, for the first time, clinically oriented multi-task generalization.
📝 Abstract
The advent of multimodal large language models (MLLMs) has sparked interest in their application to electrocardiogram (ECG) analysis. However, existing ECG-focused MLLMs primarily focus on report generation tasks, often limited to single 12-lead, short-duration (10s) ECG inputs, thereby underutilizing the potential of MLLMs. To this end, we aim to develop a MLLM for ECG analysis that supports a broader range of tasks and more flexible ECG inputs. However, existing ECG-QA datasets are often monotonous. To address this gap, we first constructed the anyECG dataset, which encompasses a wide variety of tasks, including report generation, abnormal waveform localization, and open-ended question answering. In addition to standard hospital ECGs, we introduced long-duration reduced-lead ECGs for home environments and multiple ECG comparison scenarios commonly encountered in clinical practice. Furthermore, we propose the anyECG-chat model, which supports dynamic-length ECG inputs and multiple ECG inputs. We trained the model using a three-stage curriculum training recipe with the anyECG dataset. A comprehensive evaluation was conducted, demonstrating that anyECG-chat is capable of supporting various practical application scenarios, including not only common report generation tasks but also abnormal waveform localization for long-duration reduced-lead ECGs in home environments and comprehensive comparative analysis of multiple ECGs.