🤖 AI Summary
This work proposes the ELF (Encoder-Less Framework) series of models, introducing for the first time an encoder-free design to electrocardiogram (ECG) language modeling. Existing approaches rely on pretrained ECG encoders, resulting in complex architectures and inefficient training. In contrast, ELF directly processes raw ECG signals alongside textual inputs within an end-to-end multimodal large language model, eliminating the need for a dedicated ECG encoder. This simplification substantially reduces architectural complexity while preserving strong performance. Experimental results on two standard benchmarks demonstrate that ELF achieves performance comparable to or better than current state-of-the-art methods, confirming both its effectiveness and structural elegance.
📝 Abstract
ECG-Language Models (ELMs) extend recent progress in Multimodal Large Language Models (MLLMs) to automated ECG interpretation. However, most ELMs follow Vision-Language Model (VLM) designs and depend on pretrained ECG encoders, adding architectural and training complexity. Inspired by encoder-free VLMs, we introduce ELF, an encoder-free ELM that replaces the ECG encoder with a single projection layer trained jointly with the LLM. Across five datasets, ELF matches or exceeds state-of-the-art ELMs that use far more complex encoders and training pipelines. We also test whether adding architectural biases to ELF improves performance and find that the single linear projection remains competitive. Finally, we show that ELF, and potentially other ELMs, often rely more on benchmark artifacts and language priors than ECG-derived information, highlighting limitations in current evaluation practices and ELM design. All data and code is available at https://github.com/willxxy/ECG-Bench.