SOLIDGEO: Measuring Multimodal Spatial Math Reasoning in Solid Geometry

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal large language models (MLLMs) are predominantly evaluated on planar geometry, with no systematic benchmark for spatial reasoning—particularly solid geometry—despite its higher cognitive demands. Method: We introduce SolidGeo, the first large-scale 3D geometric mathematical reasoning benchmark, comprising 3,113 K–12 and competition-level problems. Each item is accompanied by standardized 3D visual renderings (generated via CAD/3D engines), difficulty annotations, and fine-grained geometric categories (e.g., projection, net unfolding, spatial measurement, vector geometry). Benchmark construction integrates pedagogical and geometric domain expertise through multi-expert collaborative annotation and rigorous human verification. Contribution/Results: Evaluation of state-of-the-art MLLMs on SolidGeo reveals average accuracy below 40% of human performance, exposing critical bottlenecks in spatial imagination, multi-view transformation, and cross-modal alignment. This work establishes the first systematic, open multimodal evaluation framework for solid geometry, releasing the dataset and comprehensive analysis to advance spatial intelligence research.

Technology Category

Application Category

📝 Abstract
Geometry is a fundamental branch of mathematics and plays a crucial role in evaluating the reasoning capabilities of multimodal large language models (MLLMs). However, existing multimodal mathematics benchmarks mainly focus on plane geometry and largely ignore solid geometry, which requires spatial reasoning and is more challenging than plane geometry. To address this critical gap, we introduce SolidGeo, the first large-scale benchmark specifically designed to evaluate the performance of MLLMs on mathematical reasoning tasks in solid geometry. SolidGeo consists of 3,113 real-world K-12 and competition-level problems, each paired with visual context and annotated with difficulty levels and fine-grained solid geometry categories. Our benchmark covers a wide range of 3D reasoning subjects such as projection, unfolding, spatial measurement, and spatial vector, offering a rigorous testbed for assessing solid geometry. Through extensive experiments, we observe that MLLMs encounter substantial challenges in solid geometry math tasks, with a considerable performance gap relative to human capabilities on SolidGeo. Moreover, we analyze the performance, inference efficiency and error patterns of various models, offering insights into the solid geometric mathematical reasoning capabilities of MLLMs. We hope SolidGeo serves as a catalyst for advancing MLLMs toward deeper geometric reasoning and spatial intelligence.
Problem

Research questions and friction points this paper is trying to address.

Evaluating MLLMs on solid geometry spatial reasoning
Addressing lack of benchmarks for 3D math problems
Assessing performance gaps between MLLMs and humans
Innovation

Methods, ideas, or system contributions that make the work stand out.

First large-scale benchmark for solid geometry
Covers 3D reasoning subjects like projection
Analyzes MLLMs performance and error patterns
🔎 Similar Papers
No similar papers found.
Peijie Wang
Peijie Wang
Institute of Automation Chinese Academy of Sciences
Multimodal LLMsmath reasoning
C
Chao Yang
University of Electronic Science and Technology of China
Z
Zhong-Zhi Li
MAIS, Institute of Automation of Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
F
Fei Yin
MAIS, Institute of Automation of Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
D
Dekang Ran
MAIS, Institute of Automation of Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences
Mi Tian
Mi Tian
TAL
Z
Zhilong Ji
TAL
J
Jinfeng Bai
TAL
C
Cheng-Lin Liu
MAIS, Institute of Automation of Chinese Academy of Sciences; School of Artificial Intelligence, University of Chinese Academy of Sciences