Towards Autoformalization of LLM-generated Outputs for Requirement Verification

📅 2025-11-14

📈 Citations: 0

✨ Influential: 0

career value

145K/year

🤖 AI Summary

This work addresses the lack of formal fidelity verification methods for natural language outputs—such as Gherkin scenarios—generated by large language models (LLMs). To this end, we propose a logic-based consistency verification framework grounded in automated formalization. Methodologically, we introduce automated formalization to LLM output validation for the first time: an LLM-driven formalizer translates both natural language requirements and LLM-generated outputs into first-order logic formulas; formal reasoning is then applied to assess semantic equivalence and detect logical contradictions. Experiments demonstrate that our approach effectively identifies semantic equivalence across paraphrased expressions and uncovers latent logical inconsistencies, thereby significantly enhancing the trustworthiness of generated artifacts. Our primary contribution is establishing the first formal verification paradigm tailored to LLM-generated outputs, providing both theoretical foundations and practical methodology for ensuring the verifiability of automated artifacts in requirements engineering.

Technology Category

Application Category

📝 Abstract

Autoformalization, the process of translating informal statements into formal logic, has gained renewed interest with the emergence of powerful Large Language Models (LLMs). While LLMs show promise in generating structured outputs from natural language (NL), such as Gherkin Scenarios from NL feature requirements, there's currently no formal method to verify if these outputs are accurate. This paper takes a preliminary step toward addressing this gap by exploring the use of a simple LLM-based autoformalizer to verify LLM-generated outputs against a small set of natural language requirements. We conducted two distinct experiments. In the first one, the autoformalizer successfully identified that two differently-worded NL requirements were logically equivalent, demonstrating the pipeline's potential for consistency checks. In the second, the autoformalizer was used to identify a logical inconsistency between a given NL requirement and an LLM-generated output, highlighting its utility as a formal verification tool. Our findings, while limited, suggest that autoformalization holds significant potential for ensuring the fidelity and logical consistency of LLM-generated outputs, laying a crucial foundation for future, more extensive studies into this novel application.

Problem

Research questions and friction points this paper is trying to address.

Verifying accuracy of LLM-generated structured outputs from natural language requirements

Developing formal methods to check logical consistency of autoformalized requirements

Ensuring fidelity between informal statements and LLM-generated formal outputs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autoformalization translates informal statements into formal logic

LLM-based autoformalizer verifies LLM-generated outputs against requirements

Autoformalizer identifies logical equivalence and inconsistency in requirements

🔎 Similar Papers

SpecGen: Automated Generation of Formal Program Specifications via Large Language Models