7Bench: a Comprehensive Benchmark for Layout-guided Text-to-image Models

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Existing layout-guided text-to-image generation benchmarks lack joint evaluation of semantic and spatial alignment, hindering accurate assessment of spatial fidelity. To address this, we propose LayoutBench—the first comprehensive benchmark for evaluating dual alignment between textual semantics and layout structure—covering seven challenging scenarios. We design a fine-grained layout alignment scoring system, establish a text-layout pairing evaluation protocol grounded in diffusion models, and enhance quantitative measurement of spatial relations. Extensive experiments reveal performance disparities and limitations of state-of-the-art diffusion models in controlling objects, attributes, colors, and spatial relationships. All code, data, and evaluation tools are publicly released, establishing a standardized, reproducible evaluation platform for controllable image generation research.

Technology Category

Application Category

📝 Abstract

Layout-guided text-to-image models offer greater control over the generation process by explicitly conditioning image synthesis on the spatial arrangement of elements. As a result, their adoption has increased in many computer vision applications, ranging from content creation to synthetic data generation. A critical challenge is achieving precise alignment between the image, textual prompt, and layout, ensuring semantic fidelity and spatial accuracy. Although recent benchmarks assess text alignment, layout alignment remains overlooked, and no existing benchmark jointly evaluates both. This gap limits the ability to evaluate a model's spatial fidelity, which is crucial when using layout-guided generation for synthetic data, as errors can introduce noise and degrade data quality. In this work, we introduce 7Bench, the first benchmark to assess both semantic and spatial alignment in layout-guided text-to-image generation. It features text-and-layout pairs spanning seven challenging scenarios, investigating object generation, color fidelity, attribute recognition, inter-object relationships, and spatial control. We propose an evaluation protocol that builds on existing frameworks by incorporating the layout alignment score to assess spatial accuracy. Using 7Bench, we evaluate several state-of-the-art diffusion models, uncovering their respective strengths and limitations across diverse alignment tasks. The benchmark is available at https://github.com/Elizzo/7Bench.

Problem

Research questions and friction points this paper is trying to address.

Assesses semantic and spatial alignment in layout-guided text-to-image models

Evaluates object generation, color fidelity, and inter-object relationships

Proposes a benchmark for spatial accuracy in synthetic data generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces 7Bench for layout-text-image alignment

Evaluates semantic and spatial alignment jointly

Proposes layout alignment score for accuracy

🔎 Similar Papers

Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation