Enhancing Person-to-Person Virtual Try-On with Multi-Garment Virtual Try-Off

📅 2025-04-17

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This paper addresses inaccurate modeling of garment texture, shape, and pattern—as well as erroneous transfer of non-target attributes (e.g., skin tone)—in person-to-person virtual try-on (p2p-VTON). We propose a latent diffusion-based multi-category garment standardization and extraction framework. Our key contributions are: (1) the first fine-grained virtual garment removal technique supporting tops, bottoms, and dresses; (2) the first class-conditioned joint extraction and disentangled modeling of multiple garments, incorporating class-specific embeddings to suppress cross-category interference; and (3) integration of the SigLIP visual encoder to enhance fine-grained image-conditioned control. Evaluated on VITON-HD and DressCode, our method achieves state-of-the-art performance, significantly improving try-on realism and cross-person consistency. The source code is publicly available.

Technology Category

Application Category

📝 Abstract

Computer vision is transforming fashion through Virtual Try-On (VTON) and Virtual Try-Off (VTOFF). VTON generates images of a person in a specified garment using a target photo and a standardized garment image, while a more challenging variant, Person-to-Person Virtual Try-On (p2p-VTON), uses a photo of another person wearing the garment. VTOFF, on the other hand, extracts standardized garment images from clothed individuals. We introduce TryOffDiff, a diffusion-based VTOFF model. Built on a latent diffusion framework with SigLIP image conditioning, it effectively captures garment properties like texture, shape, and patterns. TryOffDiff achieves state-of-the-art results on VITON-HD and strong performance on DressCode dataset, covering upper-body, lower-body, and dresses. Enhanced with class-specific embeddings, it pioneers multi-garment VTOFF, the first of its kind. When paired with VTON models, it improves p2p-VTON by minimizing unwanted attribute transfer, such as skin color. Code is available at: https://rizavelioglu.github.io/tryoffdiff/

Problem

Research questions and friction points this paper is trying to address.

Develops diffusion-based model for multi-garment virtual try-off

Improves person-to-person virtual try-on by reducing unwanted attribute transfer

Achieves state-of-the-art performance on VITON-HD and DressCode datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Diffusion-based VTOFF model TryOffDiff

SigLIP image conditioning for garment properties

Multi-garment VTOFF with class-specific embeddings

🔎 Similar Papers

No similar papers found.

World Labs

$250,000 - $325,000 base salary (good-faith estimate for San Francisco Bay Area upon hire; actual offer based on experience, skills, and qualifications)

San Francisco Bay Area, USA

PhD – Generative Models for Closed-loop Synthesis

Bosch Group

Renningen, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)