Editing Physiological Signals in Videos Using Latent Representations

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address privacy risks arising from contactless facial video-based physiological monitoring, this paper proposes a steganographic video editing method specifically designed for physiological signal manipulation—enabling controlled modification or suppression of sensitive biometric information (e.g., heart rate) while preserving high-fidelity visual quality. Methodologically, we introduce a learnable framework that synergistically integrates text guidance, adaptive instance normalization (AdaIN), and feature-wise linear modulation (FiLM). The framework jointly leverages spatiotemporal representations from a pretrained 3D variational autoencoder and semantic priors from a frozen text encoder to achieve precise rPPG signal modulation and biometric anonymization. Extensive experiments on multiple benchmark datasets demonstrate state-of-the-art performance: video reconstruction achieves PSNR of 38.96 dB and SSIM of 0.98; heart rate editing attains a mean absolute error of only 10.00 bpm (MAPE = 10.09%), significantly outperforming existing approaches.

Technology Category

Application Category

📝 Abstract
Camera-based physiological signal estimation provides a non-contact and convenient means to monitor Heart Rate (HR). However, the presence of vital signals in facial videos raises significant privacy concerns, as they can reveal sensitive personal information related to the health and emotional states of an individual. To address this, we propose a learned framework that edits physiological signals in videos while preserving visual fidelity. First, we encode an input video into a latent space via a pretrained 3D Variational Autoencoder (3D VAE), while a target HR prompt is embedded through a frozen text encoder. We fuse them using a set of trainable spatio-temporal layers with Adaptive Layer Normalizations (AdaLN) to capture the strong temporal coherence of remote Photoplethysmography (rPPG) signals. We apply Feature-wise Linear Modulation (FiLM) in the decoder with a fine-tuned output layer to avoid the degradation of physiological signals during reconstruction, enabling accurate physiological modulation in the reconstructed video. Empirical results show that our method preserves visual quality with an average PSNR of 38.96 dB and SSIM of 0.98 on selected datasets, while achieving an average HR modulation error of 10.00 bpm MAE and 10.09% MAPE using a state-of-the-art rPPG estimator. Our design's controllable HR editing is useful for applications such as anonymizing biometric signals in real videos or synthesizing realistic videos with desired vital signs.
Problem

Research questions and friction points this paper is trying to address.

Editing physiological signals in facial videos
Preserving visual fidelity during signal modification
Addressing privacy concerns from biometric data exposure
Innovation

Methods, ideas, or system contributions that make the work stand out.

Encodes video into latent space using 3D VAE
Fuses HR prompts via adaptive layer normalizations
Modulates signals with Feature-wise Linear Modulation
🔎 Similar Papers
No similar papers found.