🤖 AI Summary
This work proposes a neurosymbolic framework to enable scalable formal verification of C programs generated by large language models (LLMs). The approach automatically synthesizes memory-aware formal function specifications from natural language descriptions and function signatures, focusing on specification generation rather than complex loop invariants. It integrates in-context learning from LLMs with symbolic reasoning, compiler diagnostics, and a formal verification toolchain, and introduces a machine-checkable refinement mechanism based on counterexamples and symbolic refutation to iteratively improve specifications. Experiments on a newly constructed LeetCode-C-Spec benchmark demonstrate that iterative refinement significantly enhances the syntactic validity of generated specifications, while the symbolic refutation mechanism substantially improves the accuracy of correctness judgments.
📝 Abstract
Formal verification of memory-manipulating programs critically depends on precise function specifications that capture memory states written by experts. This requirement has become a major bottleneck as large language models (LLMs) increasingly generate low-level systems code whose correctness cannot be assumed. To enable scalable formal verification, we focus exclusively on function specification generation, deliberately avoiding the synthesis of complex loop invariants that are central to traditional verification pipelines. We propose a neuro-symbolic framework for automatically generating memory-aware formal function specifications for C programs from natural language problem descriptions and function signatures. The pipeline first produces candidate specifications via in-context learning, and then iteratively refines them using compiler diagnostics from symbolic provers and the verification toolchain. In particular, we validate candidate specifications by constructing a proof for the negation of the specification with concrete examples, enabling machine-checked rejection of plausible-but-incorrect specifications. To support systematic evaluation, we introduce LeetCode-C-Spec, a new benchmark of 200 C programming problems for generating memory-aware formal function specifications. Experiments show that iterative refinement substantially improves syntactic validity, while symbolic prover-based refutation significantly enhances correctness assessment by filtering false positives that LLM-only judges frequently accept. Our results demonstrate that combining neural generation with symbolic feedback provides an effective approach to formal specification synthesis for memory-safe systems software.