🤖 AI Summary
This work addresses the challenge of achieving both generalization and interpretability in peak detection across multimodal physiological signals—such as ECG, PPG, BCG, and BSG—where conventional deep learning approaches often suffer from opaque decision-making that impedes clinical validation. The authors propose an interpretable peak detection framework leveraging instruction-tuned large language models (LLMs). Their approach employs a “peak representation” technique to compress time-series data while preserving critical events, followed by a two-stage optimization combining supervised fine-tuning and multi-objective reinforcement learning. A built-in self-explanation mechanism generates human-readable rationales for model predictions. Evaluated on seven diverse datasets spanning four signal modalities, the method achieves state-of-the-art or tied-best performance, marking the first successful application of LLMs to interpretable, cross-modal physiological signal analysis with support for error attribution and clinically trustworthy verification.
📝 Abstract
Accurate peak detection across diverse cardiac physiological signals, including the Electrocardiogram (ECG), Photoplethysmogram (PPG), Ballistocardiogram (BCG), and Bodyseismography (BSG), is fundamental for cardiovascular monitoring but is often hindered by artifacts and signal variability. Conventional algorithms are typically engineered with expert knowledge for a single signal modality, limiting their generalizability. Conversely, deep learning-based methods often lack interpretability, limiting transparency for expert verification and hindering expert-computer interaction. To address these limitations, we introduce Peak-Detector, a novel framework that leverages instruction-tuned Large Language Models (LLMs) for robust, cross-modal, and explainable peak detection. A core innovation of our framework is a "peak-representation" technique that transforms time-series data into a condensed format, preserving critical event information while significantly reducing signal length. This representation provides a crucial inductive bias, guiding the LLM to reason over physiologically meaningful events rather than raw, noisy data. The model is optimized through a two-stage process: supervised fine-tuning (SFT) followed by reinforcement learning (RL) with a multi-objective reward function. The model's self-explanation capabilities are cultivated by fine-tuning on a custom-built Peak-Explanation dataset. Across four modalities-ECG, PPG, BCG, and BSG-spanning seven datasets (six public benchmarks plus one real-world cohort), Peak-Detector demonstrates strong cross-modal performance, achieving best or tied-best detection under clinically relevant temporal tolerance. Beyond accuracy, the generated rationales surface failure modes and support verification and error analysis.