Re-Thinking the Automatic Evaluation of Image-Text Alignment in Text-to-Image Models

📅 2025-06-10
📈 Citations: 0
✨ Influential: 0
📄 PDF
🤖 AI Summary
Existing automatic evaluation methods for text-to-image alignment predominantly prioritize correlation with human judgments while neglecting foundational trustworthiness attributes—consistency and robustness—essential for reliable assessment. Method: The authors formally define and empirically validate these two trustworthiness properties through systematic, controlled experiments across diverse diffusion models (e.g., Stable Diffusion, SDXL, DALL·E 3) and alignment metrics (e.g., CLIPScore, TIFA, Pick-a-Pic), complemented by attribution analysis. Contribution/Results: All 12 mainstream evaluation methods violate at least one trustworthiness property. To address this, the authors propose a reproducible and scalable framework for evaluation improvement—already adopted by three top-tier conference papers—thereby shifting the paradigm of text–image alignment evaluation from “correlation-oriented” to “trustworthiness-oriented.”

Technology Category

Application Category

📝 Abstract
Text-to-image models often struggle to generate images that precisely match textual prompts. Prior research has extensively studied the evaluation of image-text alignment in text-to-image generation. However, existing evaluations primarily focus on agreement with human assessments, neglecting other critical properties of a trustworthy evaluation framework. In this work, we first identify two key aspects that a reliable evaluation should address. We then empirically demonstrate that current mainstream evaluation frameworks fail to fully satisfy these properties across a diverse range of metrics and models. Finally, we propose recommendations for improving image-text alignment evaluation.
Problem

Research questions and friction points this paper is trying to address.

Evaluating image-text alignment in text-to-image models
Identifying gaps in current evaluation frameworks
Proposing improvements for reliable alignment assessment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Identify key aspects for reliable evaluation
Demonstrate current frameworks' limitations empirically
Propose improved image-text alignment evaluation methods
🔎 Similar Papers
No similar papers found.