DynamicFace: High-Quality and Consistent Video Face Swapping using Composable 3D Facial Priors

📅 2025-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing face-swapping methods often compromise target identity preservation and facial expression naturalness, leading to temporal inconsistency in videos. This paper proposes a high-fidelity video face replacement framework. First, it constructs a disentangled 4D (identity/expression/pose/geometry) 3D facial prior to enable fine-grained conditional control. Second, it introduces a collaborative FaceFormer–ReferenceNet architecture that decouples high-level identity injection from low-level detail reconstruction. Third, it incorporates a plug-and-play temporal attention mechanism to ensure inter-frame consistency over long video sequences. By integrating diffusion models with 3D Morphable Model (3DMM) priors, the method supports end-to-end video generation. Evaluated on FF++, it achieves state-of-the-art performance: identity similarity improves by 12.6%, expression error decreases by 31.4%, and FID drops by 2.8—demonstrating significant gains in generation stability and fidelity.

Technology Category

Application Category

📝 Abstract
Face swapping transfers the identity of a source face to a target face while retaining the attributes like expression, pose, hair, and background of the target face. Advanced face swapping methods have achieved attractive results. However, these methods often inadvertently transfer identity information from the target face, compromising expression-related details and accurate identity. We propose a novel method DynamicFace that leverages the power of diffusion model and plug-and-play temporal layers for video face swapping. First, we introduce four fine-grained face conditions using 3D facial priors. All conditions are designed to be disentangled from each other for precise and unique control. Then, we adopt Face Former and ReferenceNet for high-level and detailed identity injection. Through experiments on the FF++ dataset, we demonstrate that our method achieves state-of-the-art results in face swapping, showcasing superior image quality, identity preservation, and expression accuracy. Besides, our method could be easily transferred to video domain with temporal attention layer. Our code and results will be available on the project page: https://dynamic-face.github.io/
Problem

Research questions and friction points this paper is trying to address.

Face Swapping
Identity Preservation
Expression Naturalness
Innovation

Methods, ideas, or system contributions that make the work stand out.

DynamicFace
3D facial modeling
video face swapping
🔎 Similar Papers
No similar papers found.
Runqi Wang
Runqi Wang
Beijing Jiaotong University
Few-Shot LearningContinual LearningMuti-Modal
S
Sijie Xu
Xiaohongshu
Tianyao He
Tianyao He
Shanghai Jiao Tong University
computer Vision
Y
Yang Chen
Xiaohongshu
W
Wei Zhu
Xiaohongshu
D
Dejia Song
Xiaohongshu
N
Nemo Chen
Xiaohongshu
X
Xu Tang
Xiaohongshu
Yao Hu
Yao Hu
浙江大学
Machine Learning