Graphic-Design-Bench: A Comprehensive Benchmark for Evaluating AI on Graphic Design Tasks

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI evaluation benchmarks fail to adequately address core challenges in professional graphic design, such as structured layout composition, high-fidelity text rendering, layered visual hierarchy, vector-based generation, and animation logic. To bridge this gap, this work introduces GraphicDesignBench (GDB), the first comprehensive benchmark tailored for professional graphic design, encompassing five key dimensions: layout, typography, infographics, template semantics, and animation. GDB supports both understanding and generation tasks and is built upon real-world design templates and the LICA layered composition dataset, comprising 50 distinct tasks. It incorporates standardized metrics including spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Evaluation results reveal that current models exhibit significant deficiencies in complex spatial reasoning, vector code generation, fine-grained font perception, and temporal decomposition of animations, with particularly pronounced gaps in tasks demanding structural precision and exactness.
📝 Abstract
We introduce GraphicDesignBench (GDB), the first comprehensive benchmark suite designed specifically to evaluate AI models on the full breadth of professional graphic design tasks. Unlike existing benchmarks that focus on natural-image understanding or generic text-to-image synthesis, GDB targets the unique challenges of professional design work: translating communicative intent into structured layouts, rendering typographically faithful text, manipulating layered compositions, producing valid vector graphics, and reasoning about animation. The suite comprises 50 tasks organized along five axes: layout, typography, infographics, template & design semantics and animation, each evaluated under both understanding and generation settings, and grounded in real-world design templates drawn from the LICA layered-composition dataset. We evaluate a set of frontier closed-source models using a standardized metric taxonomy covering spatial accuracy, perceptual quality, text fidelity, semantic alignment, and structural validity. Our results reveal that current models fall short on the core challenges of professional design: spatial reasoning over complex layouts, faithful vector code generation, fine-grained typographic perception, and temporal decomposition of animations remain largely unsolved. While high-level semantic understanding is within reach, the gap widens sharply as tasks demand precision, structure, and compositional awareness. GDB provides a rigorous, reproducible testbed for tracking progress toward AI systems that can function as capable design collaborators. The full evaluation framework is publicly available.
Problem

Research questions and friction points this paper is trying to address.

graphic design
AI evaluation
layout reasoning
vector graphics
typography
Innovation

Methods, ideas, or system contributions that make the work stand out.

graphic design benchmark
structured layout generation
typographic fidelity
vector graphics synthesis
animation reasoning
🔎 Similar Papers
No similar papers found.
A
Adrienne Deganutti
E
Elad Hirsch
H
Haonan Zhu
J
Jaejung Seol
Purvanshi Mehta
Purvanshi Mehta
Microsoft
Graph Neural NetworksMultimodal LearningNLP