DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Existing face-swapping methods often compromise target identity preservation and facial expression naturalness, leading to temporal inconsistency in videos. This paper proposes a high-fidelity video face replacement framework. First, it constructs a disentangled 4D (identity/expression/pose/geometry) 3D facial prior to enable fine-grained conditional control. Second, it introduces a collaborative FaceFormer–ReferenceNet architecture that decouples high-level identity injection from low-level detail reconstruction. Third, it incorporates a plug-and-play temporal attention mechanism to ensure inter-frame consistency over long video sequences. By integrating diffusion models with 3D Morphable Model (3DMM) priors, the method supports end-to-end video generation. Evaluated on FF++, it achieves state-of-the-art performance: identity similarity improves by 12.6%, expression error decreases by 31.4%, and FID drops by 2.8—demonstrating significant gains in generation stability and fidelity.

Technology Category

Application Category

📝 Abstract

Face swapping transfers the identity of a source face to a target face while retaining the attributes like expression, pose, hair, and background of the target face. Advanced face swapping methods have achieved attractive results. However, these methods often inadvertently transfer identity information from the target face, compromising expression-related details and accurate identity. We propose a novel method DynamicFace that leverages the power of diffusion model and plug-and-play temporal layers for video face swapping. First, we introduce four fine-grained face conditions using 3D facial priors. All conditions are designed to be disentangled from each other for precise and unique control. Then, we adopt Face Former and ReferenceNet for high-level and detailed identity injection. Through experiments on the FF++ dataset, we demonstrate that our method achieves state-of-the-art results in face swapping, showcasing superior image quality, identity preservation, and expression accuracy. Besides, our method could be easily transferred to video domain with temporal attention layer. Our code and results will be available on the project page: https://dynamic-face.github.io/

Problem

Research questions and friction points this paper is trying to address.

Face Swapping

Identity Preservation

Expression Naturalness

Innovation

Methods, ideas, or system contributions that make the work stand out.

DynamicFace

3D facial modeling

video face swapping

🔎 Similar Papers

No similar papers found.

Apple

Cupertino, United States of America

Research Scientist Intern (TikTok-Privacy Innovation Lab-Multimodal Generative Model) - 2026 Start (PhD)

TikTok

San Jose, California

AI Research Scientist, Video Generation and Post Training, FAIR