ID-Booth: Identity-consistent Face Generation with Diffusion Models

๐Ÿ“… 2025-04-10
๐Ÿ“ˆ Citations: 1
โœจ Influential: 1
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diffusion models for face synthesis suffer from a fundamental trade-off between identity consistency and generation diversity: optimizing solely for image quality often compromises identity fidelity, whereas identity-supervised training tends to overfit. This paper proposes ID-Boothโ€”a novel framework that decouples identity representation from the generative process without fine-tuning the backbone diffusion model. It introduces a first-of-its-kind triplet-based identity training objective, jointly optimizing generation quality, identity preservation, and semantic controllability. Built upon a latent diffusion model (LDM), ID-Booth integrates a VAE, text encoder, and a customized denoising network to enable both text-guided and identity-anchored synthesis. Experiments demonstrate significant improvements across multiple benchmarks: +12.7% intra-class identity consistency, +9.3% inter-class separability, and +21% image diversity. Moreover, it enhances few-shot face recognition performance without requiring access to original face data.

Technology Category

Application Category

๐Ÿ“ Abstract
Recent advances in generative modeling have enabled the generation of high-quality synthetic data that is applicable in a variety of domains, including face recognition. Here, state-of-the-art generative models typically rely on conditioning and fine-tuning of powerful pretrained diffusion models to facilitate the synthesis of realistic images of a desired identity. Yet, these models often do not consider the identity of subjects during training, leading to poor consistency between generated and intended identities. In contrast, methods that employ identity-based training objectives tend to overfit on various aspects of the identity, and in turn, lower the diversity of images that can be generated. To address these issues, we present in this paper a novel generative diffusion-based framework, called ID-Booth. ID-Booth consists of a denoising network responsible for data generation, a variational auto-encoder for mapping images to and from a lower-dimensional latent space and a text encoder that allows for prompt-based control over the generation procedure. The framework utilizes a novel triplet identity training objective and enables identity-consistent image generation while retaining the synthesis capabilities of pretrained diffusion models. Experiments with a state-of-the-art latent diffusion model and diverse prompts reveal that our method facilitates better intra-identity consistency and inter-identity separability than competing methods, while achieving higher image diversity. In turn, the produced data allows for effective augmentation of small-scale datasets and training of better-performing recognition models in a privacy-preserving manner. The source code for the ID-Booth framework is publicly available at https://github.com/dariant/ID-Booth.
Problem

Research questions and friction points this paper is trying to address.

Ensures identity consistency in face generation
Balances identity fidelity and image diversity
Enhances privacy-preserving face recognition training
Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising network for identity-consistent face generation
Variational auto-encoder for latent space mapping
Triplet identity training objective for consistency
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Darian Tomavsevi'c
University of Ljubljana, Faculty of Computer and Information Science, Ljubljana, Slovenia
Fadi Boutros
Fadi Boutros
Research scientist, Fraunhofer Institute for Computer Graphics Research IGD
BiometricsFace recognitionGenerative AIComputer Vision
C
Chenhao Lin
Xiโ€™an Jiaotong University, School of Cyber Science and Engineering, Xiโ€™an, China
N
N. Damer
Fraunhofer Institute for Computer Graphics Research IGD, Darmstadt, Germany
V
Vitomir vStruc
University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia
Peter Peer
Peter Peer
University of Ljubljana, Faculty of Computer and Information Science, Slovenia
Computer VisionBiometry