TerraFormer: Automated Infrastructure-as-Code with LLMs Fine-Tuned via Policy-Guided Verifier Feedback

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of large language models (LLMs) generating Infrastructure-as-Code (IaC) that is syntactically incorrect, non-deployable, or policy-violating. The authors propose TerraFormer, the first neurosymbolic framework that integrates policy-guided formal verification feedback into LLM fine-tuning. TerraFormer combines supervised fine-tuning with verifier-guided reinforcement learning and introduces a multi-stage self-correction mechanism. The study also contributes two high-quality natural-language-to-IaC datasets, TF-Gen and TF-Mutn. Evaluated on IaC-Eval, TF-Gen, and TF-Mutn, TerraFormer achieves accuracy improvements of 15.94%, 11.65%, and 19.60% respectively, significantly outperforming 17 baselines—including GPT-4.1—and demonstrating superior compliance with security policies and infrastructure best practices.

Technology Category

Application Category

📝 Abstract
Automating Infrastructure-as-Code (IaC) is challenging, and large language models (LLMs) often produce incorrect configurations from natural language (NL). We present TerraFormer, a neuro-symbolic framework for IaC generation and mutation that combines supervised fine-tuning with verifier-guided reinforcement learning, using formal verification tools to provide feedback on syntax, deployability, and policy compliance. We curate two large, high-quality NL-to-IaC datasets, TF-Gen (152k instances) and TF-Mutn (52k instances), via multi-stage verification and iterative LLM self-correction. Evaluations against 17 state-of-the-art LLMs, including ~50x larger models like Sonnet 3.7, DeepSeek-R1, and GPT-4.1, show that TerraFormer improves correctness over its base LLM by 15.94% on IaC-Eval, 11.65% on TF-Gen (Test), and 19.60% on TF-Mutn (Test). It outperforms larger models on both TF-Gen (Test) and TF-Mutn (Test), ranks third on IaC-Eval, and achieves top best-practices and security compliance.
Problem

Research questions and friction points this paper is trying to address.

Infrastructure-as-Code
Large Language Models
Natural Language to Code
Policy Compliance
Configuration Correctness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Infrastructure-as-Code
Large Language Models
Formal Verification
Reinforcement Learning
Policy Compliance
🔎 Similar Papers
No similar papers found.