Bridging Vision, Language, and Mathematics: Pictographic Character Reconstruction with Bézier Curves

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

232K/year

🤖 AI Summary

This paper investigates the limitations of vision-language models (VLMs) in parsing low-level geometric structures of images, using oracle bone inscriptions—a prototypical ideographic script—as a test case. Method: We formulate ideograph recognition as a Bézier curve program synthesis task and propose an end-to-end image-to-vector-program mapping framework, enabling VLMs to directly generate executable Bézier curve programs from raster images, thereby explicitly modeling symbolic geometric syntax. Contribution/Results: We demonstrate that VLMs trained solely on modern Chinese characters achieve zero-shot reconstruction of oracle bone inscriptions—revealing acquisition of abstract, transferable geometric grammatical rules. Our approach significantly outperforms strong zero-shot baselines (e.g., GPT-4o) on ancient script reconstruction, establishing a novel paradigm for probing VLMs’ understanding of deep structural regularities in symbolic systems.

Technology Category

Application Category

📝 Abstract

While Vision-language Models (VLMs) have demonstrated strong semantic capabilities, their ability to interpret the underlying geometric structure of visual information is less explored. Pictographic characters, which combine visual form with symbolic structure, provide an ideal test case for this capability. We formulate this visual recognition challenge in the mathematical domain, where each character is represented by an executable program of geometric primitives. This is framed as a program synthesis task, training a VLM to decompile raster images into programs composed of Bézier curves. Our model, acting as a "visual decompiler", demonstrates performance superior to strong zero-shot baselines, including GPT-4o. The most significant finding is that when trained solely on modern Chinese characters, the model is able to reconstruct ancient Oracle Bone Script in a zero-shot context. This generalization provides strong evidence that the model acquires an abstract and transferable geometric grammar, moving beyond pixel-level pattern recognition to a more structured form of visual understanding.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing pictographic characters using Bézier curves

Training VLMs to decompile raster images into geometric programs

Testing geometric generalization from modern to ancient scripts

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Bézier curves for character reconstruction

Trains VLM to decompile images into programs

Generalizes from modern to ancient script

🔎 Similar Papers

A Cross-Font Image Retrieval Network for Recognizing Undeciphered Oracle Bone Inscriptions