🤖 AI Summary
This work addresses the lack of hierarchical structure annotations in existing graphic design datasets, which has hindered research on layer-aware editing and structured generation. To bridge this gap, we introduce a large-scale dataset comprising 1.55 million multi-layer designs, unifying static and dynamic layouts through a hierarchical compositional representation for the first time. The dataset encompasses diverse elements—such as text, images, and vectors—along with rich metadata including spatial relationships, typography, and opacity, and further incorporates 27,000 video samples annotated with dynamic keyframes and motion parameters. Spanning 20 design categories and over 970,000 templates, the dataset enables novel paradigms such as layer-aware inpainting, structured generation, controllable editing, and temporal sequence generation, thereby laying a foundation for vision-language models to achieve structured understanding and generation in graphic design.
📝 Abstract
We introduce LICA (Layered Image Composition Annotations), a large-scale dataset of 1,550,244 multi-layer graphic design compositions designed to advance structured understanding and generation of graphic layouts1. In addition to ren- dered PNG images, LICA represents each design as a hierarchical composition of typed components including text, image, vector, and group elements, each paired with rich per-element metadata such as spatial geometry, typographic attributes, opacity, and visibility. The dataset spans 20 design categories and 971,850 unique templates, providing broad coverage of real-world design structures. We further introduce graphic design video as a new and largely unexplored challenge for current vision-language models through 27,261 animated layouts annotated with per-component keyframes and motion parameters. Beyond scale, LICA establishes a new paradigm of research tasks for graphic design, enabling structured investiga- tions into problems such as layer-aware inpainting, structured layout generation, controlled design editing, and temporally-aware generative modeling. By repre- senting design as a system of compositional layers and relationships, the dataset supports research on models that operate directly on design structure rather than pixels alone.