Periodic-MAE: Periodic Video Masked Autoencoder for rPPG Estimation

📅 2025-06-26

📈 Citations: 0

✨ Influential: 0

career value

238K/year

🤖 AI Summary

This work addresses contactless heart rate measurement by investigating self-supervised learning of robust periodic physiological signal representations from unlabeled facial videos, enabling capture of subtle skin chrominance variations and estimation of remote photoplethysmography (rPPG) signals. We propose a video-level periodicity-aware masking strategy that jointly optimizes temporal frame reconstruction with sparsity constraints in the physiological frequency band (0.75–4 Hz), establishing the first rPPG-oriented video masked autoencoder (MAE) pretraining framework. The method operates label-free and leverages frequency-domain priors to guide the model toward quasi-periodic physiological dynamics. Evaluated on four major benchmarks—PURE, UBFC-rPPG, MMPD, and V4V—it achieves significant improvements in rPPG accuracy (average MAE reduction of 12.6%) and demonstrates state-of-the-art cross-dataset generalization. The implementation is publicly available.

Technology Category

Application Category

📝 Abstract

In this paper, we propose a method that learns a general representation of periodic signals from unlabeled facial videos by capturing subtle changes in skin tone over time. The proposed framework employs the video masked autoencoder to learn a high-dimensional spatio-temporal representation of the facial region through self-supervised learning. Capturing quasi-periodic signals in the video is crucial for remote photoplethysmography (rPPG) estimation. To account for signal periodicity, we apply frame masking in terms of video sampling, which allows the model to capture resampled quasi-periodic signals during the pre-training stage. Moreover, the framework incorporates physiological bandlimit constraints, leveraging the property that physiological signals are sparse within their frequency bandwidth to provide pulse cues to the model. The pre-trained encoder is then transferred to the rPPG task, where it is used to extract physiological signals from facial videos. We evaluate the proposed method through extensive experiments on the PURE, UBFC-rPPG, MMPD, and V4V datasets. Our results demonstrate significant performance improvements, particularly in challenging cross-dataset evaluations. Our code is available at https://github.com/ziiho08/Periodic-MAE.

Problem

Research questions and friction points this paper is trying to address.

Estimating rPPG from unlabeled facial videos

Learning periodic signals via masked autoencoder

Improving cross-dataset rPPG performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Video masked autoencoder for spatio-temporal representation

Frame masking captures quasi-periodic signals

Physiological bandlimit constraints enhance pulse cues

🔎 Similar Papers

VideoPrism: A Foundational Visual Encoder for Video Understanding