ATLAS: Automated Toolkit for Large-Scale Verified Code Synthesis

📅 2025-12-10

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

The scarcity of high-quality, formally verified code severely constrains the training and application of large language models (LLMs) in program verification. Method: We propose the first automated data synthesis pipeline tailored for formal verification, grounded in the Dafny language. Our approach introduces a novel multi-stage task decomposition paradigm that jointly generates specifications, implementations, and machine-checkable proofs, yielding over seven fine-grained training samples per program. Contribution/Results: We construct the largest verified-code dataset to date—comprising 2,700 fully verified programs and over 19,000 samples—and perform verification-aware fine-tuning and synthetic-data distillation on Qwen2.5-7B-Coder. Experiments demonstrate absolute accuracy improvements of 23% on DafnyBench and 50% on DafnySynthesis, substantially alleviating the data bottleneck for LLMs in program verification.

Technology Category

Application Category

📝 Abstract

Large language models have shown potential for program verification, but progress is hindered by the scarcity of verified code for training. We present ATLAS, an automated pipeline that synthesizes verified programs at scale to address this data bottleneck. ATLAS generates complete Dafny programs with specifications, implementations, and proofs, producing 2.7K verified programs from which we extract over 19K training examples--more than 7 per verified program--by decomposing the synthesis process into multiple specialized tasks. Fine-tuning Qwen 2.5 7B Coder on this dataset produces substantial gains: +23 percentage points on DafnyBench and +50 percentage points on DafnySynthesis. These results demonstrate that synthetic verified code can effectively enhance LLM capabilities for formal verification.

Problem

Research questions and friction points this paper is trying to address.

Addresses scarcity of verified code for training LLMs

Automates synthesis of verified Dafny programs at scale

Enhances LLM performance on formal verification tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automated pipeline synthesizes verified Dafny programs at scale

Decomposes synthesis into tasks to generate multiple training examples

Fine-tunes LLM on synthetic data for formal verification gains

🔎 Similar Papers

Tadashi: Enabling AI-Based Automated Code Generation With Guaranteed Correctness