Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks

📅 2026-04-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing AI evaluation benchmarks fail to adequately address core challenges in professional graphic design, such as structured layout composition, high-fidelity text rendering, layered visual hierarchy, vector-based generation, and animation logic. To bridge this gap, this work introduces GraphicDesignBench (GDB), the first comprehensive benchmark tailored for professional graphic design, encompassing five key dimensions: layout, typography, infographics, template semantics, and animation. GDB supports both understanding and generation tasks and is built upon real-world design templates and the LICA layered composition dataset, comprising 50 distinct tasks. It incorporates standardized metrics including spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Evaluation results reveal that current models exhibit significant deficiencies in complex spatial reasoning, vector code generation, fine-grained font perception, and temporal decomposition of animations, with particularly pronounced gaps in tasks demanding structural precision and exactness.

📝 Abstract

We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image understanding or generic text-to-image synthesis, GDB targets the unique challenges of professional design work: translating communicative intent into structured layouts, rendering typographically faithful text, manipulating layered compositions, producing valid vector graphics, and reasoning about animation. The suite comprises 50 tasks organized along five axes: layout, typography, infographics, template & design semantics and animation, each evaluated under both understanding and generation settings, and grounded in real-world design templates drawn from the LICA layered-composition dataset. We evaluate a set of frontier closed-source models using a standardized metric taxonomy covering spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Our results reveal that current models fall short on the core challenges of professional design: spatial reasoning over complex layouts, faithful vector code generation, fine-grained typographic perception, and temporal decomposition of animations remain largely unsolved. While high-level semantic understanding is within reach, the gap widens sharply as tasks demand precision, structure, and compositional awareness. GDB provides a rigorous, reproducible testbed for tracking progress toward AI systems that can function as capable design collaborators. The full evaluation framework is publicly available.

Problem

Research questions and friction points this paper is trying to address.

graphic design

AI evaluation

layout reasoning

vector graphics

typography

Innovation

Methods, ideas, or system contributions that make the work stand out.

graphic design benchmark

structured layout generation

typographic fidelity

vector graphics synthesis

animation reasoning

🔎 Similar Papers

No similar papers found.

Authors to Follow