Pixel Cube: Diffusion-based Portrait Video Relighting Through Realistic Lighting Reproduction

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

This work addresses the challenge of dynamic portrait video relighting, where achieving photorealism, temporal consistency, and identity preservation simultaneously remains difficult. The authors propose a hybrid approach that integrates real and rendered data, leveraging a pretrained video diffusion model, HDR environment lighting control, and an LED-based illumination capture system. A novel background-guided exposure and tone adjustment mechanism is introduced to harmonize foreground and background appearance. For the first time, this method enables joint conditioning of video diffusion priors with HDR environment maps and synthetic backgrounds, achieving state-of-the-art performance in photorealism, lighting coherence, and temporal stability. The framework generalizes effectively to unseen subjects, poses, and lighting conditions, and has been successfully deployed in practical portrait photography scenarios.

📝 Abstract

We present a diffusion-based method for relighting dynamic portrait videos with photorealism and temporal consistency. Our method is fueled by a hybrid training dataset that consists of real-captured and rendered dynamic portrait videos with diverse subject appearances, facial motions, head poses, and known lighting conditions. Specifically, we construct an LED-based lighting system for realistic lighting emulation and high-speed video relighting data acquisition. By leveraging the image priors embedded in pre-trained video diffusion models, and using per-frame high dynamic range (HDR) environment map as lighting control, we train a high-performance generative model for realistic and identity-preserving dynamic portrait video relighting. In addition to the environment map control, our model uses a synthesized background image to enable control on the camera's exposure level and color tone. Our model can produce temporally consistent relit portrait video that looks realistic and harmonious under a provided new environment and faithfully preserve the subject's expression and fine facial features, including skin tone, wrinkles, and facial hair. Our model generalizes well to unseen data, in terms of the subject appearance, motion, and lighting condition. We perform extensive experiments on relighting in-the-wild videos with various environment maps and demonstrate practical applications on portrait photography. Results show that our method achieves state-of-the-art performance in photorealism, lighting harmony, and temporal consistency.

Problem

Research questions and friction points this paper is trying to address.

portrait video relighting

photorealism

temporal consistency

dynamic lighting

identity preservation

Innovation

Methods, ideas, or system contributions that make the work stand out.

diffusion-based relighting

HDR environment map

temporal consistency