🤖 AI Summary
This work addresses the limited generalizability of existing virtual try-on systems, which perform well on Western clothing and female-centric datasets but struggle with structurally complex and culturally diverse garments. To bridge this gap, the authors introduce BD-VITON, the first virtual try-on dataset dedicated to traditional Bangladeshi attire—including sarees, Punjabi suits, and salwar kameez—featuring both male and female subjects. They establish strong baselines by fine-tuning state-of-the-art diffusion-based models (StableViton, HR-VITON, and VITON-HD) on this dataset. Experimental results demonstrate that fine-tuned models significantly outperform zero-shot inference in both quantitative metrics and visual fidelity, underscoring the critical role of culturally specific data in enhancing model generalization. This study fills a key void in current virtual try-on benchmarks by incorporating cultural diversity and garment structural complexity.
📝 Abstract
Although existing virtual try-on systems have made significant progress with the advent of diffusion models, the current benchmarks of these models are based on datasets that are dominant in western-style clothing and female models, limiting their ability to generalize culturally diverse clothing styles. In this work, we introduce BD-VITON, a virtual try-on dataset focused on Bangladeshi garments, including saree, panjabi and salwar kameez, covering both male and female categories as well. These garments present unique structural challenges such as complex draping, asymmetric layering, and high deformation complexities which are underrepresented in the original VITON dataset. To establish strong baselines, we retrain and evaluate try-on models, namely StableViton, HR-VITON, and VITON-HD on our dataset. Our experiments demonstrate consistent improvements in terms of both quantitative and qualitative analysis, compared to zero shot inference.