DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment

📅 2024-03-25

🏛️ arXiv.org

📈 Citations: 6

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing GAN-based video-driven face reenactment methods suffer from reconstruction artifacts, background blurring, and loss of fine-grained appearance details (e.g., hair color, eyewear, accessories). To address these limitations, this work introduces Controllable Diffusion Autoencoders (DiffAE) to face reenactment for the first time, proposing an end-to-end framework built upon diffusion probabilistic models (DPMs). Our method achieves disentangled representation of pose and expression, enabling one-shot, zero-shot cross-subject reenactment without subject-specific fine-tuning. Leveraging the high-fidelity generation capability of DPMs, it accurately transfers target motion while preserving the source identity and intricate appearance attributes. Extensive experiments demonstrate that our approach achieves state-of-the-art performance in identity fidelity, expression accuracy, and image realism—significantly outperforming GAN-based methods, StyleGAN2, and existing diffusion-based reenactment approaches.

Technology Category

Application Category

📝 Abstract

Video-driven neural face reenactment aims to synthesize realistic facial images that successfully preserve the identity and appearance of a source face, while transferring the target head pose and facial expressions. Existing GAN-based methods suffer from either distortions and visual artifacts or poor reconstruction quality, i.e., the background and several important appearance details, such as hair style/color, glasses and accessories, are not faithfully reconstructed. Recent advances in Diffusion Probabilistic Models (DPMs) enable the generation of high-quality realistic images. To this end, in this paper we present DiffusionAct, a novel method that leverages the photo-realistic image generation of diffusion models to perform neural face reenactment. Specifically, we propose to control the semantic space of a Diffusion Autoencoder (DiffAE), in order to edit the facial pose of the input images, defined as the head pose orientation and the facial expressions. Our method allows one-shot, self, and cross-subject reenactment, without requiring subject-specific fine-tuning. We compare against state-of-the-art GAN-, StyleGAN2-, and diffusion-based methods, showing better or on-par reenactment performance.

Problem

Research questions and friction points this paper is trying to address.

Achieve realistic face reenactment preserving identity and appearance

Overcome distortions and artifacts in GAN-based face reenactment methods

Control diffusion models for pose and expression editing without fine-tuning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages diffusion models for realistic face reenactment

Controls Diffusion Autoencoder semantic space for pose editing

Enables one-shot cross-subject reenactment without fine-tuning

🔎 Similar Papers

No similar papers found.