Enhancing Virtual Try-On with Synthetic Pairs and Error-Aware Noise Scheduling

📅 2025-01-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the dual challenges of scarce real human-clothing pairing data and texture distortion in virtual try-on, this paper proposes an end-to-end error-aware inpainting framework. First, it introduces a single-image-driven garment extraction and synthesis method to efficiently construct high-quality (person image, synthesized garment) training pairs. Second, it proposes the Error-Aware Schrödinger Bridge (EARSB) model, which jointly incorporates weakly supervised error localization and confidence-guided noise scheduling to enable localized, fine-grained texture restoration. Evaluated on VITON-HD and DressCode-Upper, the method achieves significant improvements in FID, LPIPS, and user preference rate (59%). To our knowledge, this is the first work to integrate physics-inspired diffusion modeling with an interpretable, error-guided refinement mechanism—establishing a novel paradigm for high-fidelity virtual try-on.

Technology Category

Application Category

📝 Abstract
Given an isolated garment image in a canonical product view and a separate image of a person, the virtual try-on task aims to generate a new image of the person wearing the target garment. Prior virtual try-on works face two major challenges in achieving this goal: a) the paired (human, garment) training data has limited availability; b) generating textures on the human that perfectly match that of the prompted garment is difficult, often resulting in distorted text and faded textures. Our work explores ways to tackle these issues through both synthetic data as well as model refinement. We introduce a garment extraction model that generates (human, synthetic garment) pairs from a single image of a clothed individual. The synthetic pairs can then be used to augment the training of virtual try-on. We also propose an Error-Aware Refinement-based Schr""odinger Bridge (EARSB) that surgically targets localized generation errors for correcting the output of a base virtual try-on model. To identify likely errors, we propose a weakly-supervised error classifier that localizes regions for refinement, subsequently augmenting the Schr""odinger Bridge's noise schedule with its confidence heatmap. Experiments on VITON-HD and DressCode-Upper demonstrate that our synthetic data augmentation enhances the performance of prior work, while EARSB improves the overall image quality. In user studies, our model is preferred by the users in an average of 59% of cases.
Problem

Research questions and friction points this paper is trying to address.

Virtual Try-On
Pattern Reproduction
Matching Images
Innovation

Methods, ideas, or system contributions that make the work stand out.

Virtual Try-On
Error-perceptible Noise
Correction Model
🔎 Similar Papers
No similar papers found.