LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenge that large language models (LLMs) struggle to generate mechanically verifiable formal mathematical proofs. To bridge this gap, the authors propose the LEAP framework, which integrates the informal reasoning capabilities of general-purpose LLMs with the interactive verification power of the Lean theorem prover through an agent-based architecture. LEAP decomposes complex theorems into incrementally verifiable subtasks and constructs complete proofs via instruction-following and iterative self-refinement strategies. The study achieves the first breakthrough in automatically generating formal proofs for International Mathematical Olympiad (IMO)-level theorems using general LLMs, introduces a new benchmark—Lean-IMO-Bench—and successfully formalizes a key subproof of an open problem in combinatorics posed by Knuth. Experiments show that LEAP raises the one-shot success rate on Lean-IMO-Bench from under 10% to 70%, surpassing specialized systems (48%), and solves all 12 problems from the 2025 Putnam Competition.

📝 Abstract

Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic framework that enables general-purpose foundation models to achieve state-of-the-art performance on automated formal theorem proving. LEAP leverages foundation model capabilities, such as informal reasoning, instruction following, and iterative self-refinement. By decomposing complex problems into smaller units, the system bridges formal proof construction with informal blueprints through continuous interaction with the Lean compiler. To provide a rigorous evaluation beyond increasingly saturated benchmarks, we introduce Lean-IMO-Bench, a benchmark of IMO-style problems formalized in Lean, with short statements yet highly non-routine and multi-step proofs across a wide range of difficulty levels. Empirically, on the latest 2025 Putnam Competition, an annual mathematics competition for undergraduate students in North America, LEAP solves all 12 problems, matching recent breakthroughs by frontier formal mathematical models. On Lean-IMO-Bench, LEAP boosts the one-shot formal solve rate of general-purpose LLMs from below 10% to 70%, notably surpassing the 48% benchmark set by a specialized, gold-medal-caliber IMO system. Furthermore, we demonstrate LEAP's research-level utility by autonomously formalizing complex proofs for open combinatorial challenges, including a verified proof for a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.

Problem

Research questions and friction points this paper is trying to address.

formal theorem proving

large language models

mechanically verifiable proofs

formal mathematics

Lean

Innovation

Methods, ideas, or system contributions that make the work stand out.

agentic framework

formal theorem proving

Lean