Neural Theorem Proving: Generating and Structuring Proofs for Formal Verification

📅 2025-04-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of verifying the correctness of code generated by large language models (LLMs), this paper proposes an end-to-end neural theorem proving framework. First, generated code is translated into natural-language specifications; second, a two-stage fine-tuned LLM—combining supervised fine-tuning (SFT) and PPO-based reinforcement learning—generates formal proofs in Isabelle/HOL; third, a heuristic module performs structured verification. The work introduces a novel three-stage pipeline: “natural language → formal proof → structured verification.” It also releases FVEL_ER, the first benchmark dataset tailored for access-control policy code verification. Experiments demonstrate significant improvements in proof success rate on miniF2F-test and achieve, for the first time, fully automated formal verification of AWS S3 policy code. All training data and source code are publicly released.

Technology Category

Application Category

📝 Abstract
Formally verifying properties of software code has been a highly desirable task, especially with the emergence of LLM-generated code. In the same vein, they provide an interesting avenue for the exploration of formal verification and mechanistic interpretability. Since the introduction of code-specific models, despite their successes in generating code in Lean4 and Isabelle, the task of generalized theorem proving still remains far from being fully solved and will be a benchmark for reasoning capability in LLMs. In this work, we introduce a framework that generates whole proofs in a formal language to be used within systems that utilize the power of built-in tactics and off-the-shelf automated theorem provers. Our framework includes 3 components: generating natural language statements of the code to be verified, an LLM that generates formal proofs for the given statement, and a module employing heuristics for building the final proof. To train the LLM, we employ a 2-stage fine-tuning process, where we first use SFT-based training to enable the model to generate syntactically correct Isabelle code and then RL-based training that encourages the model to generate proofs verified by a theorem prover. We validate our framework using the miniF2F-test benchmark and the Isabelle proof assistant and design a use case to verify the correctness of the AWS S3 bucket access policy code. We also curate a dataset based on the FVEL extsubscript{ extnormal{ER}} dataset for future training tasks.
Problem

Research questions and friction points this paper is trying to address.

Developing a framework for generating formal proofs in theorem proving
Enhancing LLMs' capability in generalized theorem proving for verification
Verifying correctness of software code using automated theorem provers
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates formal proofs using LLMs
Employs two-stage fine-tuning process
Combines heuristics with automated theorem provers
🔎 Similar Papers
No similar papers found.