🤖 AI Summary
Existing fashion image editing methods struggle to automatically enhance intrinsic fashionability while preserving human structural integrity. This paper proposes a prompt-free clothing image editing framework based on conditional diffusion models. Our approach addresses this challenge through two key contributions: (1) a novel fashionability enhancement mechanism that jointly optimizes fashion-guided generation with human-structure-preserving constraints; and (2) a dual-track evaluation framework integrating OpenSkill-based probabilistic ranking with multi-expert, five-dimensional pairwise comparison annotations—enabling both robust model training and objective assessment. Experiments demonstrate statistically significant improvements over the Fashion++ baseline in quantitative fashionability metrics, while maintaining high structural fidelity and visual appeal. Comprehensive validation via expert evaluation and user studies confirms the method’s effectiveness in generating structurally coherent and stylistically sophisticated fashion imagery.
📝 Abstract
Image generation in the fashion domain has predominantly focused on preserving body characteristics or following input prompts, but little attention has been paid to improving the inherent fashionability of the output images. This paper presents a novel diffusion model-based approach that generates fashion images with improved fashionability while maintaining control over key attributes. Key components of our method include: 1) fashionability enhancement, which ensures that the generated images are more fashionable than the input; 2) preservation of body characteristics, encouraging the generated images to maintain the original shape and proportions of the input; and 3) automatic fashion optimization, which does not rely on manual input or external prompts. We also employ two methods to collect training data for guidance while generating and evaluating the images. In particular, we rate outfit images using fashionability scores annotated by multiple fashion experts through OpenSkill-based and five critical aspect-based pairwise comparisons. These methods provide complementary perspectives for assessing and improving the fashionability of the generated images. The experimental results show that our approach outperforms the baseline Fashion++ in generating images with superior fashionability, demonstrating its effectiveness in producing more stylish and appealing fashion images.