Deployability-Centric Infrastructure-as-Code Generation: An LLM-based Iterative Framework

📅 2025-06-05

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

Existing LLM-based Infrastructure-as-Code (IaC) generation methods overemphasize syntactic correctness while neglecting deployability—the core practicality metric. To address this, we propose IaCGen, a deployment-oriented iterative IaC generation framework. IaCGen introduces the first generation paradigm explicitly optimizing for deployment success rate; constructs DPIaC-Eval, the first comprehensive 153-scenario benchmark covering syntax, deployability, intent alignment, and security; and designs a deployment validation feedback loop coupled with multi-stage prompt engineering. Experiments show that IaCGen achieves >90% deployment success within 25 iterations across mainstream LLMs, with Claude-3.5 reaching 98% (+67 percentage points). Crucially, our analysis reveals intent alignment (25.2% failure rate) and security compliance (8.4% failure rate) as the primary bottlenecks in current IaC generation.

Technology Category

Application Category

📝 Abstract

Infrastructure-as-Code (IaC) generation holds significant promise for automating cloud infrastructure provisioning. Recent advances in Large Language Models (LLMs) present a promising opportunity to democratize IaC development by generating deployable infrastructure templates from natural language descriptions, but current evaluation focuses on syntactic correctness while ignoring deployability, the fatal measure of IaC template utility. We address this gap through two contributions: (1) IaCGen, an LLM-based deployability-centric framework that uses iterative feedback mechanism to generate IaC templates, and (2) DPIaC-Eval, a deployability-centric IaC template benchmark consists of 153 real-world scenarios that can evaluate syntax, deployment, user intent, and security. Our evaluation reveals that state-of-the-art LLMs initially performed poorly, with Claude-3.5 and Claude-3.7 achieving only 30.2% and 26.8% deployment success on the first attempt respectively. However, IaCGen transforms this performance dramatically: all evaluated models reach over 90% passItr@25, with Claude-3.5 and Claude-3.7 achieving 98% success rate. Despite these improvements, critical challenges remain in user intent alignment (25.2% accuracy) and security compliance (8.4% pass rate), highlighting areas requiring continued research. Our work provides the first comprehensive assessment of deployability-centric IaC template generation and establishes a foundation for future research.

Problem

Research questions and friction points this paper is trying to address.

Addressing deployability gap in LLM-generated IaC templates

Improving deployment success rates of IaC using iterative feedback

Evaluating user intent alignment and security in IaC generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based iterative framework for IaC generation

Deployability-centric benchmark with real-world scenarios

Iterative feedback improves deployment success significantly

🔎 Similar Papers

No similar papers found.