TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models

📅 2024-11-27
🏛️ arXiv.org
📈 Citations: 5
Influential: 3
📄 PDF
🤖 AI Summary
This paper introduces Virtual Try-Off (VTOFF), a novel task that reconstructs standardized garment images from single in-the-wild clothed person portraits with high fidelity—posing core challenges in disentangling and accurately recovering garment shape, texture, and fine-grained patterns. Methodologically, we establish the first VTOFF evaluation paradigm; design a SigLIP-enhanced Stable Diffusion architecture that eliminates redundant pose transfer and post-processing common in traditional virtual try-on (VTON); integrate a SigLIP visual encoder for fine-grained text-guided image generation; and adopt DISTS—rather than PSNR or SSIM—for more robust reconstruction quality assessment. Evaluated on an enhanced VITON-HD dataset, our approach significantly outperforms pose-transfer and VTON baselines, yielding garments with superior structural accuracy and sharper local details. Results demonstrate VTOFF’s practical utility for e-commerce product imagery generation and its potential as a rigorous benchmark for generative model evaluation.

Technology Category

Application Category

📝 Abstract
This paper introduces Virtual Try-Off (VTOFF), a novel task focused on generating standardized garment images from single photos of clothed individuals. Unlike traditional Virtual Try-On (VTON), which digitally dresses models, VTOFF aims to extract a canonical garment image, posing unique challenges in capturing garment shape, texture, and intricate patterns. This well-defined target makes VTOFF particularly effective for evaluating reconstruction fidelity in generative models. We present TryOffDiff, a model that adapts Stable Diffusion with SigLIP-based visual conditioning to ensure high fidelity and detail retention. Experiments on a modified VITON-HD dataset show that our approach outperforms baseline methods based on pose transfer and virtual try-on with fewer pre- and post-processing steps. Our analysis reveals that traditional image generation metrics inadequately assess reconstruction quality, prompting us to rely on DISTS for more accurate evaluation. Our results highlight the potential of VTOFF to enhance product imagery in e-commerce applications, advance generative model evaluation, and inspire future work on high-fidelity reconstruction. Demo, code, and models are available at: https://rizavelioglu.github.io/tryoffdiff/
Problem

Research questions and friction points this paper is trying to address.

Generate standardized garment images from single photos
Reconstruct garment shape, texture, and complex patterns accurately
Improve e-commerce imagery and generative model evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

High-fidelity garment reconstruction via diffusion models
SigLIP-based visual conditioning for precise details
DISTS metric for reliable reconstruction assessment
🔎 Similar Papers
No similar papers found.