🤖 AI Summary
To address the challenges of dynamic frame rates and audio-video desynchronization in online cardiac monitoring (OCM) on video streaming platforms, this paper proposes the first real-time OCM system tailored for streaming services. Methodologically, we design CardioNet—a multimodal neural network that jointly models heart-rate signals in both time and frequency domains by fusing subtle video-based motion cues and audio-derived pulse features. We further introduce a Service-On-Demand (SoD) plugin-based middleware architecture enabling low-latency streaming inference and cross-platform deployment across heterogeneous environments. Experimental results demonstrate an average absolute error of only 1.79 BPM in heart-rate estimation—improving upon unimodal video- and audio-only baselines by 69.2% and 81.2%, respectively. The system achieves throughputs of 115.97 FPS on Zoom and 98.16 FPS on YouTube streams, effectively supporting remote health monitoring, affective computing, and deepfake detection.
📝 Abstract
Online Cardiac Monitoring (OCM) emerges as a compelling enhancement for the next-generation video streaming platforms. It enables various applications including remote health, online affective computing, and deepfake detection. Yet the physiological information encapsulated in the video streams has been long neglected. In this paper, we present the design and implementation of CardioLive, the first online cardiac monitoring system in video streaming platforms. We leverage the naturally co-existed video and audio streams and devise CardioNet, the first audio-visual network to learn the cardiac series. It incorporates multiple unique designs to extract temporal and spectral features, ensuring robust performance under realistic video streaming conditions. To enable the Service-On-Demand online cardiac monitoring, we implement CardioLive as a plug-and-play middleware service and develop systematic solutions to practical issues including changing FPS and unsynchronized streams. Extensive experiments have been done to demonstrate the effectiveness of our system. We achieve a Mean Square Error (MAE) of 1.79 BPM error, outperforming the video-only and audio-only solutions by 69.2% and 81.2%, respectively. Our CardioLive service achieves average throughputs of 115.97 and 98.16 FPS when implemented in Zoom and YouTube. We believe our work opens up new applications for video stream systems. We will release the code soon.