SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation

📅 2025-09-29

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

Text-to-SVG generation faces two key challenges: poor generalization and weak instruction following. To address these, we propose a reasoning-driven instruction alignment framework that explicitly models the visual reasoning process, enabling stepwise generation of complete, editable, and structurally coherent SVG primitives. Our method integrates large language models with multimodal understanding, incorporating staged code generation and supervised fine-tuning. Crucially, we leverage multimodal annotated data to expose and supervise the chain-of-thought reasoning, thereby enhancing reasoning consistency and mitigating hallucination. Experiments demonstrate that our approach significantly outperforms existing methods in generation stability, editability, and visual fidelity—while preserving the inherent advantages of vector graphics—thus advancing the practical deployment of automated graphic design systems.

Technology Category

Application Category

📝 Abstract

Scalable Vector Graphics (SVG) is a code-based representation for 2D visuals. Leveraging recent advances in large language models (LLMs), we study text-to-SVG generation and address two persistent gaps: weak generalization and poor adherence to input instructions. We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process and supports the full set of SVG primitives. Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code; we then build stepwise updates that mirror the incremental addition of primitives. On this data, we train an LLM with supervised fine-tuning that exposes its chain-of-thought as intermediate reasoning, improving robustness and reducing errors and hallucinations. Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs while preserving the structural advantages of vector graphics. Unlike image-based methods, our outputs enable precise and hierarchical editing, opening new directions for design, content creation, and automated graphics generation.

Problem

Research questions and friction points this paper is trying to address.

Addresses weak generalization in text-to-SVG generation

Improves adherence to input instructions for SVG creation

Reduces errors and hallucinations in vector graphics generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-driven framework for SVG code generation

Multimodal annotation of image and code primitives

Chain-of-thought fine-tuning reduces errors and hallucinations

🔎 Similar Papers

SVGCraft: Beyond Single Object Text-to-SVG Synthesis with Comprehensive Canvas Layout