Alchemy: Amplifying Theorem-Proving Capability through Symbolic Mutation

📅 2024-10-21
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the severe scarcity of formalized data in neural theorem proving (NTP), this paper proposes a symbolic mutation–driven theorem synthesis framework grounded in Mathlib. It systematically generates high-quality formal theorems at scale via equivalence-preserving symbol substitution and antecedent instantiation. Crucially, it pioneers the use of synthesized theorems for both continual pretraining and supervised fine-tuning of large language models. The framework yields the largest publicly available formal theorem corpus to date—6 million theorems—representing a two-order-of-magnitude expansion over the original Mathlib (110,000 theorems). Empirical evaluation demonstrates substantial improvements: +4.70 absolute accuracy on the LeanDojo benchmark and +2.47 on miniF2F (out-of-distribution), confirming significant gains in generalization capability and formal reasoning performance.

Technology Category

Application Category

📝 Abstract
Formal proofs are challenging to write even for experienced experts. Recent progress in Neural Theorem Proving (NTP) shows promise in expediting this process. However, the formal corpora available on the Internet are limited compared to the general text, posing a significant data scarcity challenge for NTP. To address this issue, this work proposes Alchemy, a general framework for data synthesis that constructs formal theorems through symbolic mutation. Specifically, for each candidate theorem in Mathlib, we identify all invocable theorems that can be used to rewrite or apply to it. Subsequently, we mutate the candidate theorem by replacing the corresponding term in the statement with its equivalent form or antecedent. As a result, our method increases the number of theorems in Mathlib by an order of magnitude, from 110k to 6M. Furthermore, we perform continual pretraining and supervised finetuning on this augmented corpus for large language models. Experimental results demonstrate the effectiveness of our approach, achieving a 4.70% absolute performance improvement on Leandojo benchmark. Additionally, our approach achieves a 2.47% absolute performance gain on the out-of-distribution miniF2F benchmark based on the synthetic data.To provide further insights, we conduct a comprehensive analysis of synthetic data composition and the training paradigm, offering valuable guidance for developing a strong theorem prover.
Problem

Research questions and friction points this paper is trying to address.

Addresses data scarcity in Neural Theorem Proving (NTP) via symbolic mutation.
Enhances theorem corpus from 110k to 6M by mutating Mathlib theorems.
Improves theorem-proving performance by 4.70% on Leandojo benchmark.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Symbolic mutation synthesizes formal theorems
Augments Mathlib theorems from 110k to 6M
Continual pretraining boosts theorem-proving performance
🔎 Similar Papers
No similar papers found.
S
Shaonan Wu
National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University
S
Shuai Lu
Microsoft Research Asia
Yeyun Gong
Yeyun Gong
Microsoft Research Asia
Natural Language GenerationQuestion AnsweringPre-training
Nan Duan
Nan Duan
JD.Com (now) | StepFun | Microsoft Research
NLPArtificial General Intelligence
Ping Wei
Ping Wei
Fudan university
Multimedia securityImage synthesis