🤖 AI Summary
This study systematically evaluates the applicability of large language models (LLMs) across the end-to-end apparel design pipeline, focusing on three core tasks: fabric selection, construction reconstruction (weave structure, color tone, silhouette), and fit–function adaptation (e.g., haute couture vs. athletic wear). We propose the first textile-domain-specific multimodal evaluation framework, integrating perceptual similarity (LPIPS), fine-grained image classification, and text-generation analysis, alongside a novel fabric image–attribute dataset. Experiments reveal that OpenAI models achieve state-of-the-art performance in fabric generation (LPIPS ≈ 0.2) and basic construction classification (80% accuracy), yet all models exhibit significant bottlenecks in interpreting complex design semantics—while demonstrating robust textual suggestion capabilities. The work delineates the practical potential and fundamental limitations of LLMs for fashion-creative decision support, establishing a reproducible methodology and empirical benchmark for AI-enabled sustainable and personalized apparel design.
📝 Abstract
Fashion has evolved from handcrafted designs to automated production over the years, where AI has added another dimension to it. Nowadays, practically every industry uses artificial models to automate their operations. To explore their role, we examined three prominent LLMs (OpenAI, GeminiAI, Deepseek) in multiple stages of textile manufacturing (e.g., sustainable choice, cost effectiveness, production planning, etc.). We assessed the models' ability to replicate garment design using certain parameters (fabric construction, shade, weave, silhouette, etc.). We compared the models in terms of different body types and functional purposes (e.g., fashionwear, sportswear) so that designers could evaluate effectiveness before developing actual patterns, make necessary modifications, and conduct fashion forecasting beforehand. To facilitate deeper analysis, we created a custom dataset specifically for fabric image generation and classification. Our analysis revealed that, in terms of fabric construction, the OpenAI DALL-E model integrated with ChatGPT outperformed other models, achieving a lower LPIPS (Learned Perceptual Image Patch Similarity) score of approximately $0.2$. In fabric classification from images, we found OpenAI offered the best results by breaking down certain factors (e.g., breathability, moisture-wicking, and tactile comfort), achieving approximately $80%$ accuracy for base construction and $55%$ for detailed construction. However, our results indicate that Deepseek faced significant challenges in generating and recognizing fabric images. Overall, all the models struggled to recognize complex fabric constructions and intricate designs from images, and relying too much on AI might hinder human creativity. We also observed that all three models performed effectively in providing recommendations and insights for fabric design in textual form.