🤖 AI Summary
Existing large language models (LLMs) exhibit poor generalization in formal theorem proving, struggle with cross-domain continual learning in mathematics, and suffer from catastrophic forgetting. Method: We propose the first lifelong learning agent framework tailored for theorem proving, overcoming static training limitations via a dynamic mathematical knowledge base and a difficulty-adaptive curriculum learning strategy; it integrates interactive proof assistance (Lean), incremental fine-tuning, and dual forgetting-mitigation mechanisms—positive/negative forgetting suppression and backward transfer enhancement—to balance stability and plasticity. Contribution/Results: Evaluated across 23 Lean repositories, our agent automatically completes 155 missing high-level proofs in abstract algebra and algebraic topology. It achieves statistically significant improvements in both stability (reduced forgetting) and backward transfer performance over state-of-the-art baselines.
📝 Abstract
Large Language Models (LLMs) have been successful in mathematical reasoning tasks such as formal theorem proving when integrated with interactive proof assistants like Lean. Existing approaches involve training or fine-tuning an LLM on a specific dataset to perform well on particular domains, such as undergraduate-level mathematics. These methods struggle with generalizability to advanced mathematics. A fundamental limitation is that these approaches operate on static domains, failing to capture how mathematicians often work across multiple domains and projects simultaneously or cyclically. We present LeanAgent, a novel lifelong learning framework for formal theorem proving that continuously generalizes to and improves on ever-expanding mathematical knowledge without forgetting previously learned knowledge. LeanAgent introduces several key innovations, including a curriculum learning strategy that optimizes the learning trajectory in terms of mathematical difficulty, a dynamic database for efficient management of evolving mathematical knowledge, and progressive training to balance stability and plasticity. LeanAgent successfully generates formal proofs for 155 theorems across 23 diverse Lean repositories where formal proofs were previously missing, many from advanced mathematics. It performs significantly better than the static LLM baseline, proving challenging theorems in domains like abstract algebra and algebraic topology while showcasing a clear progression of learning from basic concepts to advanced topics. In addition, we analyze LeanAgent's superior performance on key lifelong learning metrics. LeanAgent achieves exceptional scores in stability and backward transfer, where learning new tasks improves performance on previously learned tasks. This emphasizes LeanAgent's continuous generalizability and improvement, explaining its superior theorem-proving performance.