Artificial Intelligence for Mathematical Reasoning: An Integrated Survey of Language Models, Neuro-symbolic Systems, and Verified Discovery

📅 2026-06-07

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work systematically integrates and evaluates the full spectrum of advances in artificial intelligence for mathematical reasoning, spanning challenges from informal textual reasoning to formal theorem proving and mathematical discovery. We propose the first unified framework that cohesively combines informal and formal reasoning, multi-agent collaboration, verification loops, and discovery mechanisms. A four-dimensional taxonomy is introduced to critically analyze the fragility of existing approaches. By surveying mainstream techniques—including chain-of-thought prompting, neuro-symbolic systems, autoformalization, and reinforcement learning with verifiable rewards—and their associated benchmarks, we expose critical issues in current evaluations such as saturation, data contamination, and bias. The paper advocates for future research directions centered on verifiable discovery, reasoning efficiency, and accessible infrastructure.

📝 Abstract

Mathematical reasoning has long served as a stringent test of machine intelligence; over the past decade, it has moved from a niche problem within NLP to one of the most consequential AI frontiers. This survey provides a unified account of the field's evolution, from early rule-based math word problem (MWP) solvers and template-driven geometry systems, through neural expression generation and LLM prompting, to contemporary reasoning models, multi-agent systems, neuro-symbolic theorem provers, and verified discovery workflows. We organize the landscape along four axes: (i) informal reasoning over text and diagrams, spanning MWP solving, multimodal geometry, and VLMs; (ii) formal reasoning in proof assistants, including autoformalization, tactic prediction, compiler-guided repair, and proof search; (iii) mathematical discovery, where systems propose constructions, improve bounds, or assist attacks on open problems; and (iv) the inference and training-time techniques, including CoT prompting, tool use, process reward models, and RLVR, that increasingly connect generation with verification. We catalog major benchmarks across grade-school arithmetic, competition mathematics, geometry, formal proving, multimodal and multilingual reasoning, and expert evaluation, and we examine benchmark saturation, contamination, reporting mismatches, and the distinction between pass@1, majority voting, and verifier-assisted pass@$k$. We critically assess failure modes: brittleness under perturbation, reward hacking, multimodal grounding failures, fragile formalization, and the energy cost of reasoning-scale inference. Drawing on recent perspectives from working mathematicians, we identify future directions centered on verified-discovery workflows, reasoning efficiency, and infrastructure to make AI-assisted formalization broadly usable. Companion materials: https://github.com/Starscream-11813/awesome-AI4Math.

Problem

Research questions and friction points this paper is trying to address.

mathematical reasoning

formal verification

AI-assisted discovery

neuro-symbolic systems

language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

verified discovery

neuro-symbolic reasoning

autoformalization