DiffSwap++: 3D Latent-Controlled Diffusion for Identity-Preserving Face Swapping

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Existing diffusion-based face-swapping methods suffer from inadequate identity preservation and visible artifacts under complex poses and expressions, primarily due to the absence of explicit 3D facial structural modeling. To address this, we propose the first diffusion-based face-swapping framework incorporating 3D facial latent features. Our method jointly conditions the denoising process on 3D morphable parameters, identity embeddings, and facial landmarks—enabling effective disentanglement of identity, pose, and expression. Generation is driven by a 3D-aware representation, and we introduce a dual-modality evaluation protocol combining biometric identification metrics with human perceptual studies. Extensive experiments on CelebA, FFHQ, and CelebV-Text demonstrate significant improvements over state-of-the-art methods, achieving both high-fidelity visual quality and strong identity consistency.

Technology Category

Application Category

📝 Abstract

Diffusion-based approaches have recently achieved strong results in face swapping, offering improved visual quality over traditional GAN-based methods. However, even state-of-the-art models often suffer from fine-grained artifacts and poor identity preservation, particularly under challenging poses and expressions. A key limitation of existing approaches is their failure to meaningfully leverage 3D facial structure, which is crucial for disentangling identity from pose and expression. In this work, we propose DiffSwap++, a novel diffusion-based face-swapping pipeline that incorporates 3D facial latent features during training. By guiding the generation process with 3D-aware representations, our method enhances geometric consistency and improves the disentanglement of facial identity from appearance attributes. We further design a diffusion architecture that conditions the denoising process on both identity embeddings and facial landmarks, enabling high-fidelity and identity-preserving face swaps. Extensive experiments on CelebA, FFHQ, and CelebV-Text demonstrate that DiffSwap++ outperforms prior methods in preserving source identity while maintaining target pose and expression. Additionally, we introduce a biometric-style evaluation and conduct a user study to further validate the realism and effectiveness of our approach. Code will be made publicly available at https://github.com/WestonBond/DiffSwapPP

Problem

Research questions and friction points this paper is trying to address.

Improving identity preservation in face swapping under challenging poses

Addressing fine-grained artifacts in diffusion-based face swapping methods

Incorporating 3D facial structure to disentangle identity from appearance attributes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses 3D facial latent features for training

Conditions denoising on identity embeddings and landmarks

Enhances geometric consistency with 3D-aware representations

🔎 Similar Papers

FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models