Boost Your Human Image Generation Model via Direct Preference Optimization

📅 2024-05-30
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
To address anatomical distortions and inaccurate pose modeling—key bottlenecks in human image generation that impair photorealism—this paper proposes HG-DPO, the first Direct Preference Optimization (DPO) framework leveraging high-fidelity real images as “preferred” samples. Unlike conventional DPO approaches that rely on generated images as positive examples, HG-DPO constructs fine-grained human-centric preference supervision using real images and introduces a curriculum learning strategy to mitigate training instability. The method integrates text-conditioned diffusion models with contrastive modeling between real and generated images, enabling both identity preservation and personalized text-to-image synthesis. Extensive evaluations across multiple benchmarks demonstrate significant improvements in anatomical plausibility, perceptual realism, and identity consistency, comprehensively outperforming existing state-of-the-art methods.

Technology Category

Application Category

📝 Abstract
Human image generation is a key focus in image synthesis due to its broad applications, but even slight inaccuracies in anatomy, pose, or details can compromise realism. To address these challenges, we explore Direct Preference Optimization (DPO), which trains models to generate preferred (winning) images while diverging from non-preferred (losing) ones. However, conventional DPO methods use generated images as winning images, limiting realism. To overcome this limitation, we propose an enhanced DPO approach that incorporates high-quality real images as winning images, encouraging outputs to resemble real images rather than generated ones. However, implementing this concept is not a trivial task. Therefore, our approach, HG-DPO (Human image Generation through DPO), employs a novel curriculum learning framework that gradually improves the output of the model toward greater realism, making training more feasible. Furthermore, HG-DPO effectively adapts to personalized text-to-image tasks, generating high-quality and identity-specific images, which highlights the practical value of our approach.
Problem

Research questions and friction points this paper is trying to address.

Improving realism in human image generation models
Enhancing DPO with real images for better outputs
Adapting HG-DPO for personalized text-to-image tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses real images as winning samples in DPO
Introduces curriculum learning for gradual realism
Adapts DPO for personalized text-to-image tasks
🔎 Similar Papers
No similar papers found.
S
Sanghyeon Na
Kakao Brain
Y
Yonggyu Kim
Kakao Brain
H
Hyunjoon Lee
Kakao Brain