Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off

πŸ“… 2025-04-17
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This paper addresses inaccurate modeling of garment texture, shape, and patternβ€”as well as erroneous transfer of non-target attributes (e.g., skin tone)β€”in person-to-person virtual try-on (p2p-VTON). We propose a latent diffusion-based multi-category garment standardization and extraction framework. Our key contributions are: (1) the first fine-grained virtual garment removal technique supporting tops, bottoms, and dresses; (2) the first class-conditioned joint extraction and disentangled modeling of multiple garments, incorporating class-specific embeddings to suppress cross-category interference; and (3) integration of the SigLIP visual encoder to enhance fine-grained image-conditioned control. Evaluated on VITON-HD and DressCode, our method achieves state-of-the-art performance, significantly improving try-on realism and cross-person consistency. The source code is publicly available.

Technology Category

Application Category

πŸ“ Abstract
Computer vision is transforming fashion through Virtual Try-On (VTON) and Virtual Try-Off (VTOFF). VTON generates images of a person in a specified garment using a target photo and a standardized garment image, while a more challenging variant, Person-to-Person Virtual Try-On (p2p-VTON), uses a photo of another person wearing the garment. VTOFF, on the other hand, extracts standardized garment images from clothed individuals. We introduce TryOffDiff, a diffusion-based VTOFF model. Built on a latent diffusion framework with SigLIP image conditioning, it effectively captures garment properties like texture, shape, and patterns. TryOffDiff achieves state-of-the-art results on VITON-HD and strong performance on DressCode dataset, covering upper-body, lower-body, and dresses. Enhanced with class-specific embeddings, it pioneers multi-garment VTOFF, the first of its kind. When paired with VTON models, it improves p2p-VTON by minimizing unwanted attribute transfer, such as skin color. Code is available at: https://rizavelioglu.github.io/tryoffdiff/
Problem

Research questions and friction points this paper is trying to address.

Develops diffusion-based model for multi-garment virtual try-off
Improves person-to-person virtual try-on by reducing unwanted attribute transfer
Achieves state-of-the-art performance on VITON-HD and DressCode datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based VTOFF model TryOffDiff
SigLIP image conditioning for garment properties
Multi-garment VTOFF with class-specific embeddings
πŸ”Ž Similar Papers
No similar papers found.