DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping

📅 2025-11-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing diffusion-based face-swapping methods suffer from inadequate identity preservation and visible artifacts under complex poses and expressions, primarily due to the absence of explicit 3D facial structural modeling. To address this, we propose the first diffusion-based face-swapping framework incorporating 3D facial latent features. Our method jointly conditions the denoising process on 3D morphable parameters, identity embeddings, and facial landmarks—enabling effective disentanglement of identity, pose, and expression. Generation is driven by a 3D-aware representation, and we introduce a dual-modality evaluation protocol combining biometric identification metrics with human perceptual studies. Extensive experiments on CelebA, FFHQ, and CelebV-Text demonstrate significant improvements over state-of-the-art methods, achieving both high-fidelity visual quality and strong identity consistency.

Technology Category

Application Category

📝 Abstract
Diffusion-based approaches have recently achieved strong results in face swapping, offering improved visual quality over traditional GAN-based methods. However, even state-of-the-art models often suffer from fine-grained artifacts and poor identity preservation, particularly under challenging poses and expressions. A key limitation of existing approaches is their failure to meaningfully leverage 3D facial structure, which is crucial for disentangling identity from pose and expression. In this work, we propose DiffSwap++, a novel diffusion-based face-swapping pipeline that incorporates 3D facial latent features during training. By guiding the generation process with 3D-aware representations, our method enhances geometric consistency and improves the disentanglement of facial identity from appearance attributes. We further design a diffusion architecture that conditions the denoising process on both identity embeddings and facial landmarks, enabling high-fidelity and identity-preserving face swaps. Extensive experiments on CelebA, FFHQ, and CelebV-Text demonstrate that DiffSwap++ outperforms prior methods in preserving source identity while maintaining target pose and expression. Additionally, we introduce a biometric-style evaluation and conduct a user study to further validate the realism and effectiveness of our approach. Code will be made publicly available at https://github.com/WestonBond/DiffSwapPP
Problem

Research questions and friction points this paper is trying to address.

Improving identity preservation in face swapping under challenging poses
Addressing fine-grained artifacts in diffusion-based face swapping methods
Incorporating 3D facial structure to disentangle identity from appearance attributes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D facial latent features for training
Conditions denoising on identity embeddings and landmarks
Enhances geometric consistency with 3D-aware representations
🔎 Similar Papers
No similar papers found.