REAL-Prover: Retrieval Augmented Lean Prover for Mathematical Reasoning

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing formal theorem provers excel in high-school and competition-level mathematics but exhibit severely limited generalization to university-level tasks. This work introduces HERALD, the first open-source, stepwise Lean 4 theorem prover designed specifically for advanced mathematics. To overcome key bottlenecks, HERALD incorporates three core innovations: (1) a novel retrieval-augmented architecture integrating the dense retriever Leansearch-PS; (2) the HERALD-AF data distillation pipeline, which significantly improves formalization quality; and (3) the Jixia interactive environment, enabling end-to-end automatic translation from natural-language problems to Lean 4 statements. Built upon the fine-tuned large language model REAL-Prover-v1, HERALD combines supervised fine-tuning with interactive synthetic data collection. On ProofNet, HERALD achieves 23.7% Pass@64—matching the state of the art—while setting a new SOTA of 56.7% Pass@64 on the newly introduced algebra benchmark FATE-M.

Technology Category

Application Category

📝 Abstract
Nowadays, formal theorem provers have made monumental progress on high-school and competition-level mathematics, but few of them generalize to more advanced mathematics. In this paper, we present REAL-Prover, a new open-source stepwise theorem prover for Lean 4 to push this boundary. This prover, based on our fine-tuned large language model (REAL-Prover-v1) and integrated with a retrieval system (Leansearch-PS), notably boosts performance on solving college-level mathematics problems. To train REAL-Prover-v1, we developed HERALD-AF, a data extraction pipeline that converts natural language math problems into formal statements, and a new open-source Lean 4 interactive environment (Jixia-interactive) to facilitate synthesis data collection. In our experiments, our prover using only supervised fine-tune achieves competitive results with a 23.7% success rate (Pass@64) on the ProofNet dataset-comparable to state-of-the-art (SOTA) models. To further evaluate our approach, we introduce FATE-M, a new benchmark focused on algebraic problems, where our prover achieves a SOTA success rate of 56.7% (Pass@64).
Problem

Research questions and friction points this paper is trying to address.

Develops REAL-Prover for advanced math theorem proving
Integrates retrieval system to enhance college-level math solving
Introduces new benchmarks for evaluating algebraic problem performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned large language model for Lean 4
Retrieval system integration for math problems
New data extraction pipeline for formal statements
🔎 Similar Papers
No similar papers found.
Ziju Shen
Ziju Shen
School of Mathematical Sciences, Peking University
reinforcement learaningmachine learning
N
Naohao Huang
Renmin University of China
Fanyi Yang
Fanyi Yang
Peking University
LLM
Y
Yutong Wang
National University of Singapore
G
Guoxiong Gao
Peking University
Tianyi Xu
Tianyi Xu
Tulane University
Reinforcement LearningNetwork OptimizaitonStatisticsNLP(LLM)Operations research
J
Jiedong Jiang
Peking University
W
Wanyi He
Peking University
P
Pu Yang
Peking University
M
Mengzhou Sun
National University of Singapore
H
Haocheng Ju
Peking University
P
Peihao Wu
Ubiquant
B
Bin Dong
Beijing International Center for Mathematical Research and the New Cornerstone Science Laboratory, Peking University; Center for Machine Learning Research, Peking University; Center for Intelligent Computing, Great Bay Institute for Advanced Study, Great Bay University