Training-free Clothing Region of Interest Self-correction for Virtual Try-On

📅 2025-12-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In virtual try-on (VTON), generated garments often exhibit inconsistent patterns, textures, and boundaries relative to real clothing, while existing evaluation metrics neglect garment-target alignment. To address this, we propose an attention self-correction mechanism that dynamically constrains the attention maps during inference via an energy-based function—requiring no additional training—to enhance focus on garment-critical regions, thereby improving texture, pattern, and boundary fidelity while preserving non-garment body regions. Furthermore, we introduce VTID, a novel metric quantifying garment-to-target alignment. Evaluated on VITON-HD and DressCode, our method surpasses state-of-the-art approaches, achieving improvements of 1.4% in LPIPS, 2.3% in FID, 12.3% in KID, and 5.8% in VTID. Additionally, it boosts Rank-1 accuracy by up to 2.5% on the CC-ReID task.

Technology Category

Application Category

📝 Abstract
VTON (Virtual Try-ON) aims at synthesizing the target clothing on a certain person, preserving the details of the target clothing while keeping the rest of the person unchanged. Existing methods suffer from the discrepancies between the generated clothing results and the target ones, in terms of the patterns, textures and boundaries. Therefore, we propose to use an energy function to impose constraints on the attention map extracted through the generation process. Thus, at each generation step, the attention can be more focused on the clothing region of interest, thereby influencing the generation results to be more consistent with the target clothing details. Furthermore, to address the limitation that existing evaluation metrics concentrate solely on image realism and overlook the alignment with target elements, we design a new metric, Virtual Try-on Inception Distance (VTID), to bridge this gap and ensure a more comprehensive assessment. On the VITON-HD and DressCode datasets, our approach has outperformed the previous state-of-the-art (SOTA) methods by 1.4%, 2.3%, 12.3%, and 5.8% in the traditional metrics of LPIPS, FID, KID, and the new VTID metrics, respectively. Additionally, by applying the generated data to downstream Clothing-Change Re-identification (CC-Reid) methods, we have achieved performance improvements of 2.5%, 1.1%, and 1.6% on the LTCC, PRCC, VC-Clothes datasets in the metrics of Rank-1. The code of our method is public at https://github.com/MrWhiteSmall/CSC-VTON.git.
Problem

Research questions and friction points this paper is trying to address.

Addresses discrepancies in generated clothing patterns, textures, and boundaries
Introduces an energy function to focus attention on clothing regions
Proposes a new metric to assess alignment with target clothing details
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses energy function to constrain attention maps
Introduces VTID metric for comprehensive evaluation
Achieves SOTA performance on multiple datasets
🔎 Similar Papers
No similar papers found.
S
Shengjie Lu
Department of Computer Science and Technology, Soochow University, Suzhou, China
Z
Zhibin Wan
Department of Computer Science and Technology, Soochow University, Suzhou, China
J
Jiejie Liu
School of Advanced Technology, Xian Jiaotong-Liverpool University, Suzhou, China
Q
Quan Zhang
School of Advanced Technology, Xian Jiaotong-Liverpool University, Suzhou, China
Mingjie Sun
Mingjie Sun
Thinking Machines Lab