GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation

📅 2025-09-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Geometric Problem Solving (GPS) requires models to jointly reason over text and diagrams while performing dynamic visual operations—such as constructing auxiliary lines or applying affine transformations—yet current multimodal large language models (MLLMs) treat diagrams as static images and lack interactive, executable visual action capabilities. This work introduces GeoSketch, the first neuro-symbolic framework for GPS, establishing a closed-loop pipeline of “perception → symbolic reasoning → differentiable drawing actions.” It formally models auxiliary line construction and affine transformations as executable, verifiable visual operations. The method employs a two-stage training strategy: symbolic-trajectory-supervised fine-tuning followed by symbolic-reward-guided reinforcement learning. Evaluated on the newly constructed GeoSketch benchmark, GeoSketch significantly outperforms state-of-the-art MLLMs, achieving substantial gains in both stepwise reasoning accuracy and final solution success rate—empirically demonstrating the critical role of dynamic visual operations in geometric reasoning.

Technology Category

Application Category

📝 Abstract
Geometric Problem Solving (GPS) poses a unique challenge for Multimodal Large Language Models (MLLMs), requiring not only the joint interpretation of text and diagrams but also iterative visuospatial reasoning. While existing approaches process diagrams as static images, they lack the capacity for dynamic manipulation - a core aspect of human geometric reasoning involving auxiliary line construction and affine transformations. We present GeoSketch, a neural-symbolic framework that recasts geometric reasoning as an interactive perception-reasoning-action loop. GeoSketch integrates: (1) a Perception module that abstracts diagrams into structured logic forms, (2) a Symbolic Reasoning module that applies geometric theorems to decide the next deductive step, and (3) a Sketch Action module that executes operations such as drawing auxiliary lines or applying transformations, thereby updating the diagram in a closed loop. To train this agent, we develop a two-stage pipeline: supervised fine-tuning on 2,000 symbolic-curated trajectories followed by reinforcement learning with dense, symbolic rewards to enhance robustness and strategic exploration. To evaluate this paradigm, we introduce the GeoSketch Benchmark, a high-quality set of 390 geometry problems requiring auxiliary construction or affine transformations. Experiments on strong MLLM baselines demonstrate that GeoSketch significantly improves stepwise reasoning accuracy and problem-solving success over static perception methods. By unifying hierarchical decision-making, executable visual actions, and symbolic verification, GeoSketch advances multimodal reasoning from static interpretation to dynamic, verifiable interaction, establishing a new foundation for solving complex visuospatial problems.
Problem

Research questions and friction points this paper is trying to address.

Addresses geometric reasoning challenges in multimodal AI systems
Enables dynamic diagram manipulation through auxiliary line construction
Improves geometric problem-solving via interactive perception-action loops
Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural-symbolic framework for dynamic geometric reasoning
Interactive perception-reasoning-action loop with three modules
Two-stage training with supervised and reinforcement learning
🔎 Similar Papers
No similar papers found.
S
Shichao Weng
Fudan University, Shanghai, China
Z
Zhiqiang Wang
IFLYTEK CO.LTD
Y
Yuhua Zhou
Zhejiang University, Zhejiang, China
R
Rui Lu
The Hong Kong University of Science and Technology
T
Ting Liu
National University of Defense Technology, Hunan, China
Zhiyang Teng
Zhiyang Teng
Bytedance SG
Natural Language Processing
X
Xiaozhang Liu
Hainan University, Haikou, China
Hanmeng Liu
Hanmeng Liu
Associate Professor | Hainan University
Natural language processing