🤖 AI Summary
This work addresses the high barrier to formal verification by tackling three core challenges in LLM-based automated formal modeling: (1) the semantic gap between natural language and formal logic, (2) the complexity of formal specification languages, and (3) hallucination risks in model generation. To this end, we propose a planner-and counterexample-guided repair framework for LLM agents: it enables end-to-end translation from natural-language system descriptions to executable formal models in PAT via semantic-aware prompting, and iteratively refines outputs using counterexamples from model checking. The framework empowers non-expert users to specify, customize, and verify system behavior. Experiments on 40 benchmarks demonstrate significant improvements in both verification success rate and efficiency over baselines. Ablation studies confirm the critical contributions of the planning and repair modules. A user study further validates substantial gains in usability.
📝 Abstract
Recent advances in large language models (LLMs) offer promising potential for automating formal methods. However, applying them to formal verification remains challenging due to the complexity of specification languages, the risk of hallucinated output, and the semantic gap between natural language and formal logic. We introduce PAT-Agent, an end-to-end framework for natural language autoformalization and formal model repair that combines the generative capabilities of LLMs with the rigor of formal verification to automate the construction of verifiable formal models. In PAT-Agent, a Planning LLM first extracts key modeling elements and generates a detailed plan using semantic prompts, which then guides a Code Generation LLM to synthesize syntactically correct and semantically faithful formal models. The resulting code is verified using the Process Analysis Toolkit (PAT) model checker against user-specified properties, and when discrepancies occur, a Repair Loop is triggered to iteratively correct the model using counterexamples. To improve flexibility, we built a web-based interface that enables users, particularly non-FM-experts, to describe, customize, and verify system behaviors through user-LLM interactions. Experimental results on 40 systems show that PAT-Agent consistently outperforms baselines, achieving high verification success with superior efficiency. The ablation studies confirm the importance of both planning and repair components, and the user study demonstrates that our interface is accessible and supports effective formal modeling, even for users with limited formal methods experience.