🤖 AI Summary
To address the challenge that end users’ lack of fashion expertise leads to ambiguous text prompts and hinders fine-grained garment customization, this paper proposes a conversational image-to-prompt design framework based on large multimodal models (LMMs). Our method introduces the BUG workflow and establishes FashionEdit—the first dataset explicitly aligned with real-world fashion design processes—enabling end-to-end automation from sketch/reference-image interaction, semantically precise prompt generation, to controllable image editing. By integrating multi-granularity control mechanisms with iterative dialog-based feedback, our approach significantly improves alignment between generated outputs and user intent. Experiments on FashionEdit demonstrate superior performance over baselines across three key dimensions: generation similarity, user satisfaction, and design quality. To foster reproducibility and industrial adoption, we publicly release both the source code and the FashionEdit dataset.
📝 Abstract
Generative AI evolves the execution of complex workflows in industry, where the large multimodal model empowers fashion design in the garment industry. Current generation AI models magically transform brainstorming into fancy designs easily, but the fine-grained customization still suffers from text uncertainty without professional background knowledge from end-users. Thus, we propose the Better Understanding Generation (BUG) workflow with LMM to automatically create and fine-grain customize the cloth designs from chat with image-into-prompt. Our framework unleashes users' creative potential beyond words and also lowers the barriers of clothing design/editing without further human involvement. To prove the effectiveness of our model, we propose a new FashionEdit dataset that simulates the real-world clothing design workflow, evaluated from generation similarity, user satisfaction, and quality. The code and dataset: https://github.com/detectiveli/FashionEdit.