InstaFace: Identity-Preserving Facial Editing with Single Image Inference

📅 2025-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address critical challenges in single-image face editing—including identity collapse, misaligned hairlines, background distortion, and contextual inconsistency (e.g., hairstyle and accessories)—this paper proposes a parameter-free, 3DMM-guided diffusion framework leveraging multi-view geometric priors. Methodologically: (i) we introduce the first learnable-parameter-free 3D Morphable Model (3DMM) conditioning mechanism, enabling geometry-aware cross-view consistency; (ii) we design a joint embedding module integrating ArcFace-based identity feature distillation and CLIP-driven cross-modal semantic alignment to jointly constrain identity, hairstyle, accessories, and background; (iii) the framework supports fine-grained control over pose, expression, and illumination. Quantitatively, it achieves state-of-the-art performance with +12.6% identity preservation rate, −23.4 FID improvement, and superior edit controllability—demonstrating high-fidelity, contextually consistent editing from a single input image.

Technology Category

Application Category

📝 Abstract
Facial appearance editing is crucial for digital avatars, AR/VR, and personalized content creation, driving realistic user experiences. However, preserving identity with generative models is challenging, especially in scenarios with limited data availability. Traditional methods often require multiple images and still struggle with unnatural face shifts, inconsistent hair alignment, or excessive smoothing effects. To overcome these challenges, we introduce a novel diffusion-based framework, InstaFace, to generate realistic images while preserving identity using only a single image. Central to InstaFace, we introduce an efficient guidance network that harnesses 3D perspectives by integrating multiple 3DMM-based conditionals without introducing additional trainable parameters. Moreover, to ensure maximum identity retention as well as preservation of background, hair, and other contextual features like accessories, we introduce a novel module that utilizes feature embeddings from a facial recognition model and a pre-trained vision-language model. Quantitative evaluations demonstrate that our method outperforms several state-of-the-art approaches in terms of identity preservation, photorealism, and effective control of pose, expression, and lighting.
Problem

Research questions and friction points this paper is trying to address.

Preserve identity in facial editing with single image.
Overcome unnatural face shifts and inconsistent hair alignment.
Ensure identity retention and contextual feature preservation.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Single-image facial editing with identity preservation
Efficient guidance network using 3DMM-based conditionals
Feature embeddings for identity and context retention
🔎 Similar Papers
M
MD Wahiduzzaman Khan
University of Technology Sydney, Australia
Mingshan Jia
Mingshan Jia
Lecturer, University of Technoly Sydney
Complex Networked SystemsMachine Learning
S
Shaolin Zhang
University of Technology Sydney, Australia; Shandong University of Science and Technology, China
E
En Yu
University of Technology Sydney, Australia; Shandong University of Science and Technology, China
K
Kaska Musial-Gabrys
University of Technology Sydney, Australia