A robust PPG foundation model using multimodal physiological supervision

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing PPG foundation models rely on high-quality or scenario-specific pretraining data, limiting their generalization to real-world settings with noisy signals from everyday wearable devices. This work proposes a multimodal contrastive learning framework that leverages high-fidelity physiological signals—such as ECG and respiration recorded in ICU environments—as supervisory cues to guide the selection of contrastive samples from noisy PPG segments, enabling self-supervised representation learning without requiring clean PPG pretraining data. By integrating multimodal physiological signals into PPG foundation model pretraining for the first time, the method substantially enhances model robustness and generalization. Remarkably, using only one-third of the original pretraining subjects, it achieves performance gains on 14 out of 15 downstream tasks, spanning daily activity monitoring and heart rate prediction.
📝 Abstract
Photoplethysmography (PPG), a non-invasive measure of changes in blood volume, is widely used in both wearable devices and clinical settings. Recent PPG foundation models either use open-source ICU datasets with pretraining paradigms that require curated data and thus complicate generalization to field-like data, or use closed-source field-like PPG data. In contrast, we propose a PPG foundation model that does not require high-quality or field-like pretraining data, and instead leverages accompanying electrocardiogram and respiratory signals in ICU datasets to select contrastive samples during pretraining. Our approach allows the model to retain and learn from noisy PPG segments, improving robustness at inference. Our model, pretrained on 3x fewer subjects than existing state-of-the-art approaches, achieves performance improvements on 14 out of 15 diverse downstream tasks, including field-like daily activity and heart rate prediction. Our results demonstrate that multimodal supervision can integrate complementary physiological information to improve the robustness of PPG foundation models and enhance their generalization to consumer-grade data.
Problem

Research questions and friction points this paper is trying to address.

PPG foundation model
generalization
noisy PPG data
multimodal supervision
consumer-grade data
Innovation

Methods, ideas, or system contributions that make the work stand out.

PPG foundation model
multimodal physiological supervision
contrastive learning
robustness
generalization
🔎 Similar Papers
No similar papers found.
E
Eloy Geenjaar
Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA; Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, USA
V
Vince Calhoun
Department of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA; Tri-Institutional Center for Translational Research in Neuroimaging and Data Science (TReNDS), Georgia State University, Georgia Institute of Technology, Emory University, Atlanta, USA
S
Scott Daly
Dolby Laboratories, San Francisco, USA
Gouthaman KV
Gouthaman KV
Senior Researcher @ Dolby Laboratories, PhD@ CSE, IIT Madras
MultimodalComputer VisionNLPMusic GenerationAI Codec
Lie Lu
Lie Lu
Dolby Laboratories
Machine learningaudio/multimedia processingunderstanding and generation
Trisha Mittal
Trisha Mittal
Sr. Researcher at Dolby Laboratories
Artificial IntelligenceMachine LearningDeep LearningAffective Computing
D
Daniel P. Darcy
Dolby Laboratories, San Francisco, USA