Reference-Free Image Quality Assessment for Virtual Try-On via Human Feedback

📅 2026-03-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing virtual try-on systems lack a no-reference, single-image quality assessment method that aligns with human perception. To address this gap, this work proposes VTON-IQA, a novel framework that first introduces VTON-QBench—the largest human-annotated benchmark to date for image quality evaluation across 14 state-of-the-art virtual try-on models—and then designs a Transformer-based interleaved cross-attention mechanism to explicitly model the interaction between garment fidelity and preservation of human details. Experiments demonstrate that VTON-IQA achieves highly consistent image-level quality predictions with human judgments under no-reference conditions, establishing the first generalizable and perceptually aligned evaluation standard for virtual try-on models.

Technology Category

Application Category

📝 Abstract

Given a person image and a garment image, image-based Virtual Try-ON (VTON) synthesizes a try-on image of the person wearing the target garment. As VTON systems become increasingly important in practical applications such as fashion e-commerce, reliable evaluation of their outputs has emerged as a critical challenge. In real-world scenarios, ground-truth images of the same person wearing the target garment are typically unavailable, making reference-based evaluation impractical. Moreover, widely used distribution-level metrics such as Fréchet Inception Distance and Kernel Inception Distance measure dataset-level similarity and fail to reflect the perceptual quality of individual generated images. To address these limitations, we propose Image Quality Assessment for Virtual Try-On (VTON-IQA), a reference-free framework for human-aligned, image-level quality assessment without requiring ground-truth images. To model human perceptual judgments, we construct VTON-QBench, a large-scale human-annotated benchmark comprising 62,688 try-on images generated by 14 representative VTON models and 431,800 quality annotations collected from 13,838 qualified annotators. To the best of our knowledge, this is the largest dataset to date for human subjective evaluation in virtual try-on. Evaluating virtual try-on quality requires verifying both garment fidelity and the preservation of person-specific details. To explicitly model such interactions, we introduce an Interleaved Cross-Attention module that extends standard transformer blocks by inserting a cross-attention layer between self-attention and MLP in the latter blocks. Extensive experiments show that VTON-IQA achieves reliable human-aligned image-level quality prediction. Moreover, we conduct a comprehensive benchmark evaluation of 14 representative VTON models using VTON-IQA.

Problem

Research questions and friction points this paper is trying to address.

Virtual Try-On

Image Quality Assessment

Reference-Free Evaluation

Human Perception

No Ground Truth

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reference-Free IQA

Virtual Try-On

Human Feedback

Interleaved Cross-Attention

VTON-QBench

🔎 Similar Papers

No similar papers found.

Authors to Follow