🤖 AI Summary
Current AI coding agents treat large language models (LLMs) as autonomous decision-makers, leading to stochastic failures—including hallucinated syntax and test failures—due to uncontrolled generation. Method: This paper proposes a neuro-symbolic system for software engineering, demoting the LLM to a controlled environment component governed by deterministic workflows. It introduces three core innovations: (1) a dual-state architecture separating workflow state from environment state; (2) atomic action pairs that couple code generation with immediate verification; and (3) guard functions mapping probabilistic LLM outputs to discrete, observable states. Contribution/Results: For the first time, it imports deterministic control paradigms from classical software engineering into LLM agent design. Evaluated across 13 models (1.3B–15B parameters), the approach improves task success rate by up to 66 percentage points, with only 1.2–2.1× increased computational overhead—achieving robustness gains through architectural refinement rather than model scaling.
📝 Abstract
Current approaches to AI coding agents appear to blur the lines between the Large Language Model (LLM) and the agent itself, asking the LLM to make decisions best left to deterministic processes. This leads to systems prone to stochastic failures such as gaming unit tests or hallucinating syntax. Drawing on established software engineering practices that provide deterministic frameworks for managing unpredictable processes, this paper proposes setting the control boundary such that the LLM is treated as a component of the environment environment -- preserving its creative stochasticity -- rather than the decision-making agent.
A extbf{Dual-State Architecture} is formalized, separating workflow state (deterministic control flow) from environment state (stochastic generation). extbf{Atomic Action Pairs} couple generation with verification as indivisible transactions, where extbf{Guard Functions} act as sensing actions that project probabilistic outputs onto observable workflow state. The framework is validated on three code generation tasks across 13 LLMs (1.3B--15B parameters). For qualified instruction-following models, task success rates improved by up to 66 percentage points at 1.2--2.1$ imes$ baseline computational cost. The results suggest that architectural constraints can substitute for parameter scale in achieving reliable code generation.