Goedel-Architect: Streamlining Formal Theorem Proving with Blueprint Generation and Refinement

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

This work addresses the inefficiency of traditional formal proof methods, which often get trapped in unproductive local search loops due to a lack of global planning. To overcome this limitation, the authors propose a blueprint-centered agent framework that first constructs a dependency graph of definitions and lemmas as a proof blueprint, then concurrently attempts to close lemma nodes while iteratively refining the blueprint based on failure feedback. This paradigm leverages global dependency structures to avoid local pitfalls and supports natural language–guided initialization. Implemented with DeepSeek-V4-Flash (284B-A13B) as the reasoning engine and integrated with Lean 4 and tool-augmented provers, the system forms a generate–prove–refine loop. It achieves state-of-the-art performance on challenging mathematical benchmarks, including 100% on MiniF2F-test, 88.8% (597/672) on PutnamBench, and strong results on IMO, Putnam, and USAMO problems, with up to a 500-fold reduction in reasoning cost.

📝 Abstract

We introduce Goedel-Architect, an agentic framework for formal theorem proving in Lean 4 centered on blueprint generation and refinement. A blueprint is a dependency graph of definitions and lemmas that builds up to the main theorem. First, Goedel-Architect generates a blueprint of formally stated definitions and lemmas, along with declared dependencies. This blueprint is optionally guided by a natural language proof. Then, a tool-equipped Lean prover component closes each open lemma node in parallel using relevant dependencies. Failed lemmas in turn drive refinement of the global blueprint. This strategy contrasts with other mainstream approaches which use recursive lemma decomposition, and can inefficiently loop on dead-end strategies. Using the open-weight DeepSeek-V4-Flash (284B-A13B) as the backbone, Goedel-Architect attains 99.2% pass@1 on MiniF2F-test and 75.6% pass@1 on PutnamBench. With an optional natural-language proof seeding the initial blueprint on the harder problems, we additionally close the remaining two MiniF2F-test problems (reaching 100%), lift PutnamBench to 88.8% (597/672), and solve 4/6 on IMO 2025, 11/12 on Putnam 2025, and 3/6 on USAMO 2026. This represents state-of-the-art performance for an open-source pipeline at a price point up to 500x less than comparable open-source pipelines.

Problem

Research questions and friction points this paper is trying to address.

formal theorem proving

blueprint generation

lemma decomposition

dependency graph

automated reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

blueprint generation

formal theorem proving

dependency graph