GeoSketch: A Neural-Symbolic Approach to Geometric Multimodal Reasoning with Auxiliary Line Construction and Affine Transformation

📅 2025-09-26

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Geometric Problem Solving (GPS) requires models to jointly reason over text and diagrams while performing dynamic visual operations—such as constructing auxiliary lines or applying affine transformations—yet current multimodal large language models (MLLMs) treat diagrams as static images and lack interactive, executable visual action capabilities. This work introduces GeoSketch, the first neuro-symbolic framework for GPS, establishing a closed-loop pipeline of “perception → symbolic reasoning → differentiable drawing actions.” It formally models auxiliary line construction and affine transformations as executable, verifiable visual operations. The method employs a two-stage training strategy: symbolic-trajectory-supervised fine-tuning followed by symbolic-reward-guided reinforcement learning. Evaluated on the newly constructed GeoSketch benchmark, GeoSketch significantly outperforms state-of-the-art MLLMs, achieving substantial gains in both stepwise reasoning accuracy and final solution success rate—empirically demonstrating the critical role of dynamic visual operations in geometric reasoning.

Technology Category

Application Category

📝 Abstract

Geometric Problem Solving (GPS) poses a unique challenge for Multimodal Large Language Models (MLLMs), requiring not only the joint interpretation of text and diagrams but also iterative visuospatial reasoning. While existing approaches process diagrams as static images, they lack the capacity for dynamic manipulation - a core aspect of human geometric reasoning involving auxiliary line construction and affine transformations. We present GeoSketch, a neural-symbolic framework that recasts geometric reasoning as an interactive perception-reasoning-action loop. GeoSketch integrates: (1) a Perception module that abstracts diagrams into structured logic forms, (2) a Symbolic Reasoning module that applies geometric theorems to decide the next deductive step, and (3) a Sketch Action module that executes operations such as drawing auxiliary lines or applying transformations, thereby updating the diagram in a closed loop. To train this agent, we develop a two-stage pipeline: supervised fine-tuning on 2,000 symbolic-curated trajectories followed by reinforcement learning with dense, symbolic rewards to enhance robustness and strategic exploration. To evaluate this paradigm, we introduce the GeoSketch Benchmark, a high-quality set of 390 geometry problems requiring auxiliary construction or affine transformations. Experiments on strong MLLM baselines demonstrate that GeoSketch significantly improves stepwise reasoning accuracy and problem-solving success over static perception methods. By unifying hierarchical decision-making, executable visual actions, and symbolic verification, GeoSketch advances multimodal reasoning from static interpretation to dynamic, verifiable interaction, establishing a new foundation for solving complex visuospatial problems.

Problem

Research questions and friction points this paper is trying to address.

Addresses geometric reasoning challenges in multimodal AI systems

Enables dynamic diagram manipulation through auxiliary line construction

Improves geometric problem-solving via interactive perception-action loops

Innovation

Methods, ideas, or system contributions that make the work stand out.

Neural-symbolic framework for dynamic geometric reasoning

Interactive perception-reasoning-action loop with three modules

Two-stage training with supervised and reinforcement learning

🔎 Similar Papers

FGeo-HyperGNet: Geometry Problem Solving Integrating Formal Symbolic System and Hypergraph Neural Network