🤖 AI Summary
Existing methods struggle to reliably extract scalable vector graphics (SVG) from natural images in real-world scenarios due to challenges such as noise, complex backgrounds, and domain shift, compounded by the absence of a systematic benchmark. This work introduces the first SVG extraction task tailored for real-world conditions and presents WildSVG, a comprehensive benchmark comprising two subsets: Natural WildSVG, which pairs real-world images with ground-truth SVG annotations, and Synthetic WildSVG, generated by rendering complex SVGs and compositing them into realistic scenes. Through systematic evaluation of state-of-the-art multimodal models, we reveal their significant performance degradation under real-world conditions, while demonstrating the effectiveness of iterative refinement strategies. WildSVG thus establishes a foundational resource and standardized evaluation framework to advance future research in this domain.
📝 Abstract
We introduce the task of SVG extraction, which consists in translating specific visual inputs from an image into scalable vector graphics. Existing multimodal models achieve strong results when generating SVGs from clean renderings or textual descriptions, but they fall short in real-world scenarios where natural images introduce noise, clutter, and domain shifts. A central challenge in this direction is the lack of suitable benchmarks. To address this need, we introduce the WildSVG Benchmark, formed by two complementary datasets: Natural WildSVG, built from real images containing company logos paired with their SVG annotations, and Synthetic WildSVG, which blends complex SVG renderings into real scenes to simulate difficult conditions. Together, these resources provide the first foundation for systematic benchmarking SVG extraction. We benchmark state-of-the-art multimodal models and find that current approaches perform well below what is needed for reliable SVG extraction in real scenarios. Nonetheless, iterative refinement methods point to a promising path forward, and model capabilities are steadily improving