🤖 AI Summary
To address the coarse-grained feature modeling, noise sensitivity, and poor generalization of Transformers in remote photoplethysmography (rPPG) signal extraction—stemming from their quadratic computational complexity—the paper proposes a periodic sparse attention mechanism coupled with a fusion-guided backbone network. Specifically, it introduces physiology-informed periodic sparse attention, leveraging prior knowledge of cardiac and respiratory rhythms, and incorporates a pre-attention stage to enable fine-grained temporal modeling. Additionally, a fusion stem module is designed to guide self-attention toward salient physiological features critical for rPPG estimation. Evaluated within an end-to-end video-to-rPPG regression framework, the method significantly improves signal-to-noise ratio and cross-domain generalization. It achieves state-of-the-art performance across multiple benchmark datasets, including UBFC-RPPG, PURE, and COHFACE. The source code is publicly available.
📝 Abstract
Remote photoplethysmography (rPPG) is a non-contact method for detecting physiological signals based on facial videos, holding high potential in various applications. Due to the periodicity nature of rPPG signals, the long-range dependency capturing capacity of the transformer was assumed to be advantageous for such signals. However, existing methods have not conclusively demonstrated the superior performance of transformers over traditional convolutional neural networks. This may be attributed to the quadratic scaling exhibited by transformer with sequence length, resulting in coarse-grained feature extraction, which in turn affects robustness and generalization. To address that, this paper proposes a periodic sparse attention mechanism based on temporal attention sparsity induced by periodicity. A pre-attention stage is introduced before the conventional attention mechanism. This stage learns periodic patterns to filter out a large number of irrelevant attention computations, thus enabling fine-grained feature extraction. Moreover, to address the issue of fine-grained features being more susceptible to noise interference, a fusion stem is proposed to effectively guide self-attention towards rPPG features. It can be easily integrated into existing methods to enhance their performance. Extensive experiments show that the proposed method achieves state-of-the-art performance in both intra-dataset and cross-dataset evaluations. The codes are available at https://github.com/zizheng-guo/RhythmFormer.