🤖 AI Summary
This paper introduces Virtual Try-Off (VTOFF), a novel task that reconstructs standardized garment images from single in-the-wild clothed person portraits with high fidelity—posing core challenges in disentangling and accurately recovering garment shape, texture, and fine-grained patterns. Methodologically, we establish the first VTOFF evaluation paradigm; design a SigLIP-enhanced Stable Diffusion architecture that eliminates redundant pose transfer and post-processing common in traditional virtual try-on (VTON); integrate a SigLIP visual encoder for fine-grained text-guided image generation; and adopt DISTS—rather than PSNR or SSIM—for more robust reconstruction quality assessment. Evaluated on an enhanced VITON-HD dataset, our approach significantly outperforms pose-transfer and VTON baselines, yielding garments with superior structural accuracy and sharper local details. Results demonstrate VTOFF’s practical utility for e-commerce product imagery generation and its potential as a rigorous benchmark for generative model evaluation.
📝 Abstract
This paper introduces Virtual Try-Off (VTOFF), a novel task focused on generating standardized garment images from single photos of clothed individuals. Unlike traditional Virtual Try-On (VTON), which digitally dresses models, VTOFF aims to extract a canonical garment image, posing unique challenges in capturing garment shape, texture, and intricate patterns. This well-defined target makes VTOFF particularly effective for evaluating reconstruction fidelity in generative models. We present TryOffDiff, a model that adapts Stable Diffusion with SigLIP-based visual conditioning to ensure high fidelity and detail retention. Experiments on a modified VITON-HD dataset show that our approach outperforms baseline methods based on pose transfer and virtual try-on with fewer pre- and post-processing steps. Our analysis reveals that traditional image generation metrics inadequately assess reconstruction quality, prompting us to rely on DISTS for more accurate evaluation. Our results highlight the potential of VTOFF to enhance product imagery in e-commerce applications, advance generative model evaluation, and inspire future work on high-fidelity reconstruction. Demo, code, and models are available at: https://rizavelioglu.github.io/tryoffdiff/