FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization

📅 2025-07-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the lack of pose and illumination controllability in natural-language-driven personalized virtual try-on for fashion e-commerce, this paper proposes the first end-to-end text-to-pose-to-relighting generation framework. Methodologically, it eliminates reliance on explicit pose annotations by employing text-guided 2D pose estimation for semantic alignment; integrates diffusion models to synthesize high-fidelity dressed images; and introduces a lightweight, learnable relighting module enabling photorealistic rendering under arbitrary illumination conditions. Experimental results demonstrate that the framework significantly outperforms existing methods in fine-grained pose generation, clothing detail preservation, and illumination consistency. It achieves superior visual quality and practical applicability for e-commerce scenarios, establishing new state-of-the-art performance in controllable virtual try-on.

Technology Category

Application Category

📝 Abstract
Realistic and controllable garment visualization is critical for fashion e-commerce, where users expect personalized previews under diverse poses and lighting conditions. Existing methods often rely on predefined poses, limiting semantic flexibility and illumination adaptability. To address this, we introduce FashionPose, the first unified text-to-pose-to-relighting generation framework. Given a natural language description, our method first predicts a 2D human pose, then employs a diffusion model to generate high-fidelity person images, and finally applies a lightweight relighting module, all guided by the same textual input. By replacing explicit pose annotations with text-driven conditioning, FashionPose enables accurate pose alignment, faithful garment rendering, and flexible lighting control. Experiments demonstrate fine-grained pose synthesis and efficient, consistent relighting, providing a practical solution for personalized virtual fashion display.
Problem

Research questions and friction points this paper is trying to address.

Generates personalized fashion images from text descriptions
Overcomes limitations of predefined poses and lighting conditions
Unifies pose prediction, image generation, and relighting control
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-driven 2D human pose prediction
Diffusion model for high-fidelity image generation
Lightweight relighting module for flexible control
🔎 Similar Papers
No similar papers found.
C
Chuancheng Shi
The University of Sydney
Y
Yixiang Chen
The University of Sydney
B
Burong Lei
Shenyang Aerospace University
Jichao Chen
Jichao Chen
EURECOM, France
Machine LearningArtificial IntelligenceSensingRobotics