🤖 AI Summary
Existing PPG foundation models rely on high-quality or scenario-specific pretraining data, limiting their generalization to real-world settings with noisy signals from everyday wearable devices. This work proposes a multimodal contrastive learning framework that leverages high-fidelity physiological signals—such as ECG and respiration recorded in ICU environments—as supervisory cues to guide the selection of contrastive samples from noisy PPG segments, enabling self-supervised representation learning without requiring clean PPG pretraining data. By integrating multimodal physiological signals into PPG foundation model pretraining for the first time, the method substantially enhances model robustness and generalization. Remarkably, using only one-third of the original pretraining subjects, it achieves performance gains on 14 out of 15 downstream tasks, spanning daily activity monitoring and heart rate prediction.
📝 Abstract
Photoplethysmography (PPG), a non-invasive measure of changes in blood volume, is widely used in both wearable devices and clinical settings. Recent PPG foundation models either use open-source ICU datasets with pretraining paradigms that require curated data and thus complicate generalization to field-like data, or use closed-source field-like PPG data. In contrast, we propose a PPG foundation model that does not require high-quality or field-like pretraining data, and instead leverages accompanying electrocardiogram and respiratory signals in ICU datasets to select contrastive samples during pretraining. Our approach allows the model to retain and learn from noisy PPG segments, improving robustness at inference. Our model, pretrained on 3x fewer subjects than existing state-of-the-art approaches, achieves performance improvements on 14 out of 15 diverse downstream tasks, including field-like daily activity and heart rate prediction. Our results demonstrate that multimodal supervision can integrate complementary physiological information to improve the robustness of PPG foundation models and enhance their generalization to consumer-grade data.