SVGThinker: Instruction-Aligned and Reasoning-Driven Text-to-SVG Generation

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Text-to-SVG generation faces two key challenges: poor generalization and weak instruction following. To address these, we propose a reasoning-driven instruction alignment framework that explicitly models the visual reasoning process, enabling stepwise generation of complete, editable, and structurally coherent SVG primitives. Our method integrates large language models with multimodal understanding, incorporating staged code generation and supervised fine-tuning. Crucially, we leverage multimodal annotated data to expose and supervise the chain-of-thought reasoning, thereby enhancing reasoning consistency and mitigating hallucination. Experiments demonstrate that our approach significantly outperforms existing methods in generation stability, editability, and visual fidelity—while preserving the inherent advantages of vector graphics—thus advancing the practical deployment of automated graphic design systems.

Technology Category

Application Category

📝 Abstract
Scalable Vector Graphics (SVG) is a code-based representation for 2D visuals. Leveraging recent advances in large language models (LLMs), we study text-to-SVG generation and address two persistent gaps: weak generalization and poor adherence to input instructions. We present SVGThinker, a reasoning-driven framework that aligns the production of SVG code with the visualization process and supports the full set of SVG primitives. Our pipeline first renders each primitive in sequence and uses a multimodal model to annotate the image and code; we then build stepwise updates that mirror the incremental addition of primitives. On this data, we train an LLM with supervised fine-tuning that exposes its chain-of-thought as intermediate reasoning, improving robustness and reducing errors and hallucinations. Experiments against state-of-the-art baselines show that SVGThinker produces more stable, editable, and higher-quality SVGs while preserving the structural advantages of vector graphics. Unlike image-based methods, our outputs enable precise and hierarchical editing, opening new directions for design, content creation, and automated graphics generation.
Problem

Research questions and friction points this paper is trying to address.

Addresses weak generalization in text-to-SVG generation
Improves adherence to input instructions for SVG creation
Reduces errors and hallucinations in vector graphics generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Reasoning-driven framework for SVG code generation
Multimodal annotation of image and code primitives
Chain-of-thought fine-tuning reduces errors and hallucinations
🔎 Similar Papers
No similar papers found.
H
Hanqi Chen
Shanghai Jiao Tong University, SJTU Paris Elite Institute of Technology, Shanghai, China
Z
Zhongyin Zhao
Shanghai Jiao Tong University, Shanghai, China
Y
Ye Chen
Shanghai Jiao Tong University, Shanghai, China
Zhujin Liang
Zhujin Liang
Bigo Live
Computer VisionMachine LearningDeep Learning
B
Bingbing Ni
Shanghai Jiao Tong University, Shanghai, China