Jointly Conditioned Diffusion Model for Multi-View Pose-Guided Person Image Synthesis

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address texture incompleteness and poor cross-view consistency in pose-guided human image generation from single-view reference images, this paper proposes a joint conditional diffusion model. Methodologically, it introduces (1) an Appearance Prior Module (APM) that explicitly models multi-view correspondences of identity, color, and texture across poses; and (2) a Joint Conditional Injection (JCI) mechanism that adaptively fuses multi-view features and injects them into the denoising network—supporting variable numbers of reference inputs while preserving architectural simplicity. Evaluated on standard benchmarks, the method significantly improves visual fidelity and cross-view geometric-appearance consistency of generated images, achieving state-of-the-art performance. It further demonstrates strong generalization capability across diverse poses, identities, and clothing styles.

Technology Category

Application Category

📝 Abstract
Pose-guided human image generation is limited by incomplete textures from single reference views and the absence of explicit cross-view interaction. We present jointly conditioned diffusion model (JCDM), a jointly conditioned diffusion framework that exploits multi-view priors. The appearance prior module (APM) infers a holistic identity preserving prior from incomplete references, and the joint conditional injection (JCI) mechanism fuses multi-view cues and injects shared conditioning into the denoising backbone to align identity, color, and texture across poses. JCDM supports a variable number of reference views and integrates with standard diffusion backbones with minimal and targeted architectural modifications. Experiments demonstrate state of the art fidelity and cross-view consistency.
Problem

Research questions and friction points this paper is trying to address.

Generating person images from incomplete single-view texture references
Addressing lack of explicit cross-view interaction in pose-guided synthesis
Aligning identity, color and texture consistency across multiple poses
Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly conditioned diffusion framework with multi-view priors
Appearance prior module infers holistic identity from references
Joint conditional injection fuses multi-view cues for alignment
🔎 Similar Papers
No similar papers found.
C
Chengyu Xie
Nanjing University of Science and Technology, China
Z
Zhi Gong
Nanjing University of Science and Technology, China
J
Junchi Ren
Nanjing University of Science and Technology, China
L
Linkun Yu
Nanjing University of Science and Technology, China
Si Shen
Si Shen
Hong Kong University of Science and Technology
Data MiningWeb Search
Fei Shen
Fei Shen
National University of Singapore
Controllable GenerationMultimodal Safety
X
Xiaoyu Du
Nanjing University of Science and Technology, China