REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

📅 2025-05-27

📈 Citations: 0

✨ Influential: 0

career value

210K/year

🤖 AI Summary

Existing formal theorem provers excel in high-school and competition-level mathematics but exhibit severely limited generalization to university-level tasks. This work introduces HERALD, the first open-source, stepwise Lean 4 theorem prover designed specifically for advanced mathematics. To overcome key bottlenecks, HERALD incorporates three core innovations: (1) a novel retrieval-augmented architecture integrating the dense retriever Leansearch-PS; (2) the HERALD-AF data distillation pipeline, which significantly improves formalization quality; and (3) the Jixia interactive environment, enabling end-to-end automatic translation from natural-language problems to Lean 4 statements. Built upon the fine-tuned large language model REAL-Prover-v1, HERALD combines supervised fine-tuning with interactive synthetic data collection. On ProofNet, HERALD achieves 23.7% Pass@64—matching the state of the art—while setting a new SOTA of 56.7% Pass@64 on the newly introduced algebra benchmark FATE-M.

Technology Category

Application Category

📝 Abstract

Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).

Problem

Research questions and friction points this paper is trying to address.

Develops REAL-Prover for advanced math theorem proving

Integrates retrieval system to enhance college-level math solving

Introduces new benchmarks for evaluating algebraic problem performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned large language model for Lean 4

Retrieval system integration for math problems

New data extraction pipeline for formal statements

🔎 Similar Papers

A Semantic Search Engine for Mathlib4