T2I-ReasonBench: Benchmarking Reasoning-Informed Text-to-Image Generation

📅 2025-08-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the lack of systematic evaluation of semantic reasoning capabilities in text-to-image (T2I) models. We introduce the first multi-task benchmark covering four dimensions: idiom comprehension, text-guided image design, entity reasoning, and scientific reasoning. Our two-stage evaluation framework first employs prompt engineering to generate reasoning-oriented tasks; then jointly applies human judgment and automated metrics to assess both reasoning accuracy and image fidelity. Crucially, this framework enables the first quantitative analysis of deep cross-modal semantic alignment. Empirical evaluation across mainstream T2I models reveals significant bottlenecks in representing abstract concepts and performing logical reasoning. Our benchmark provides a reproducible foundation and theoretical grounding for modeling, evaluating, and improving reasoning capabilities in T2I systems.

Technology Category

Application Category

📝 Abstract
We propose T2I-ReasonBench, a benchmark evaluating reasoning capabilities of text-to-image (T2I) models. It consists of four dimensions: Idiom Interpretation, Textual Image Design, Entity-Reasoning and Scientific-Reasoning. We propose a two-stage evaluation protocol to assess the reasoning accuracy and image quality. We benchmark various T2I generation models, and provide comprehensive analysis on their performances.
Problem

Research questions and friction points this paper is trying to address.

Evaluating reasoning capabilities of text-to-image models
Assessing reasoning accuracy and image quality
Benchmarking T2I models across four reasoning dimensions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage evaluation protocol for reasoning
Four-dimensional benchmark for T2I models
Comprehensive analysis of model performances
🔎 Similar Papers
No similar papers found.