Intent-aligned Formal Specification Synthesis via Traceable Refinement

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

This work addresses the challenge of ensuring program correctness in natural language-to-code generation, which is often hindered by the absence of high-quality formal specifications. The authors propose VeriSpecGen, a framework that decomposes natural language requirements into atomic clauses through a traceable refinement mechanism, generates requirement-driven tests with explicit traceability mappings, and synthesizes formal specifications aligned with user intent by localizing and repairing faulty clauses upon verification failure. Integrating large language models (e.g., Claude Opus 4.5) with the Lean proof assistant, the approach leverages refinement trajectories to generate 343K training samples, substantially enhancing model generalization and reasoning capabilities. Evaluated on the VERINA SpecGen benchmark, VeriSpecGen achieves an accuracy of 86.6%, outperforming the best baseline by up to 31.8 percentage points and demonstrating a relative improvement of 62–106% in specification synthesis performance.

Technology Category

Application Category

📝 Abstract

Large language models are increasingly used to generate code from natural language, but ensuring correctness remains challenging. Formal verification offers a principled way to obtain such guarantees by proving that a program satisfies a formal specification. However, specifications are frequently missing in real-world codebases, and writing high-quality specifications remains expensive and expertise-intensive. We present VeriSpecGen, a traceable refinement framework that synthesizes intent-aligned specifications in Lean through requirement-level attribution and localized repair. VeriSpecGen decomposes natural language into atomic requirements and generates requirement-targeted tests with explicit traceability maps to validate generated specifications. When validation fails, traceability maps attribute failures to specific requirements, enabling targeted clause-level repairs. VeriSpecGen achieve 86.6% on VERINA SpecGen task using Claude Opus 4.5, improving over baselines by up to 31.8 points across different model families and scales. Beyond inference-time gains, we generate 343K training examples from VeriSpecGen refinement trajectories and demonstrate that training on these trajectories substantially improves specification synthesis by 62-106% relative and transfers gains to general reasoning abilities.

Problem

Research questions and friction points this paper is trying to address.

formal specification

code correctness

natural language to code

specification synthesis

formal verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

traceable refinement

formal specification synthesis

intent alignment