Scene Aware Person Image Generation through Global Contextual Conditioning

📅 2022-06-06

🏛️ International Conference on Pattern Recognition

📈 Citations: 4

✨ Influential: 0

career value

178K/year

🤖 AI Summary

This paper addresses the problem of generating and naturally compositing human portraits under scene constraints. We propose a three-stage cascaded generation framework: first, a Wasserstein GAN generates a semantically consistent initial human layout conditioned on the global scene skeletal structure; second, a lightweight linear skeleton refinement network enhances geometric consistency and contextual coherence; third, an image-conditional generative model synthesizes high-fidelity portraits. Our method is the first to employ scene-level skeletal structure as a cross-scale guiding signal, enabling joint modeling of position, pose, and scale. Extensive quantitative evaluations demonstrate significant improvements over state-of-the-art methods: +18.7% in pose plausibility (PCKh) and −23.4% in generation quality (FID).

📝 Abstract

Person image generation is an intriguing yet challenging problem. However, this task becomes even more difficult under constrained situations. In this work, we propose a novel pipeline to generate and insert contextually relevant person images into an existing scene while preserving the global semantics. More specifically, we aim to insert a person such that the location, pose, and scale of the person being inserted blends in with the existing persons in the scene. Our method uses three individual networks in a sequential pipeline. At first, we predict the potential location and the skeletal structure of the new person by conditioning a Wasserstein Generative Adversarial Network (WGAN) on the existing human skeletons present in the scene. Next, the predicted skeleton is refined through a shallow linear network to achieve higher structural accuracy in the generated image. Finally, the target image is generated from the refined skeleton using another generative network conditioned on a given image of the target person. In our experiments, we achieve high-resolution photo-realistic generation results while preserving the general context of the scene. We conclude our paper with multiple qualitative and quantitative benchmarks on the results.

Problem

Research questions and friction points this paper is trying to address.

Generate contextually relevant person images

Insert person images into existing scenes

Preserve global semantics and scene context

Innovation

Methods, ideas, or system contributions that make the work stand out.

Wasserstein Generative Adversarial Network

sequential pipeline networks

contextually relevant image generation

🔎 Similar Papers

Single Image, Any Face: Generalisable 3D Face Generation

2024-09-25arXiv.orgCitations: 0

TikTok

San Jose, California

Research Engineer/Scientist (all levels), World Models

TikTok

San Jose, California

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)