VectorEdits: A Dataset and Benchmark for Instruction-Based Editing of Vector Graphics

📅 2025-06-18

📈 Citations: 0

✨ Influential: 0

career value

199K/year

🤖 AI Summary

This work addresses the low precision and lack of standardized evaluation in natural language (NL)-driven vector graphics editing. To this end, we introduce the first large-scale instruction-driven vector editing benchmark—comprising over 270,000 SVG image pairs and corresponding high-quality natural language instructions. Our method features an automated data construction pipeline: (i) CLIP-based semantic embedding matching to select semantically consistent image pairs; (ii) diverse instruction generation using multimodal large models (VLMs); and (iii) rigorous quality control via SVG syntax validation and structured editing assessment. Empirical evaluation reveals substantial limitations of current large language and multimodal models in geometric accuracy, topological consistency, and structural fidelity. All data, source code, and evaluation tools are publicly released, establishing a foundational resource and unified evaluation standard for NL-driven vector graphics research.

Technology Category

Application Category

📝 Abstract

We introduce a large-scale dataset for instruction-guided vector image editing, consisting of over 270,000 pairs of SVG images paired with natural language edit instructions. Our dataset enables training and evaluation of models that modify vector graphics based on textual commands. We describe the data collection process, including image pairing via CLIP similarity and instruction generation with vision-language models. Initial experiments with state-of-the-art large language models reveal that current methods struggle to produce accurate and valid edits, underscoring the challenge of this task. To foster research in natural language-driven vector graphic generation and editing, we make our resources created within this work publicly available.

Problem

Research questions and friction points this paper is trying to address.

Dataset for instruction-guided vector image editing

Training models to modify vector graphics via text

Challenges in accurate vector graphic edits with language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Dataset for instruction-guided vector image editing

CLIP similarity for image pairing

Vision-language models for instruction generation

🔎 Similar Papers

No similar papers found.