🤖 AI Summary
This work addresses the challenge of automatically formalizing mathematical statements from quantum computing literature by introducing the first end-to-end, fully automated agent framework. The system extracts theorems directly from LaTeX source files and translates them into verifiable Lean 4 code based on Mathlib, then back-translates the formalized results into human-readable LaTeX to facilitate semantic review. Integrating an agent-based architecture with bidirectional translation between natural and formal languages, LaTeX parsing, and semantic alignment techniques, the approach successfully formalizes 114 mathematical statements from three quantum computing papers, generating 2,050 Lean declarations with full automation—requiring human intervention only for validating newly introduced definitions and axioms. This significantly lowers the barrier to formal verification, provides high-quality synthetic data for training reasoning models, and is readily extensible to mathematics and theoretical physics.
📝 Abstract
We introduce MerLean, a fully automated agentic framework for autoformalization in quantum computation. MerLean extracts mathematical statements from \LaTeX{} source files, formalizes them into verified Lean~4 code built on Mathlib, and translates the result back into human-readable \LaTeX{} for semantic review. We evaluate MerLean on three theoretical quantum computing papers producing 2,050 Lean declarations from 114 statements in total. MerLean achieves end-to-end formalization on all three papers, reducing the verification burden to only the newly introduced definitions and axioms. Our results demonstrate that agentic autoformalization can scale to frontier research, offering both a practical tool for machine-verified peer review and a scalable engine for mining high-quality synthetic data to train future reasoning models. Our approach can also be generalized to any other rigorous research in mathematics and theoretical physics.