🤖 AI Summary
This work addresses the frequent lack of semantic validity in code generated by large language models (LLMs) for software engineering tasks. To this end, it introduces a projection decoding framework that, for the first time, treats graph-based representations as first-class citizens alongside textual sequences during generation. The framework incrementally constructs partial graph structures in parallel with token prediction, directly embedding domain-specific semantics into the decoding process. This integration enables uncertainty modeling, incremental semantic validation, and provable correctness guarantees, thereby establishing a verifiable foundation for LLM-driven software engineering automation. Experimental results demonstrate that the proposed approach significantly improves the semantic validity of generated artifacts in program synthesis tasks.
📝 Abstract
Large language models (LLMs) are increasingly used to generate software artifacts across many software engineering (SE) tasks, yet ensuring the semantic validity of these artifacts remains a fundamental challenge. Existing constrained decoding techniques can enforce syntactic correctness and, in some cases, specific semantic rules, but lack a general representation that bridges LLM-generated text with the reasoning required for semantic validation in SE. In this paper, we propose projectional decoding, a novel conceptual framework that integrates domain semantics directly into the generation process by maintaining, alongside text, a partial graph model as the primary artifact representation throughout generation. This abstract representation enables incremental semantic validation by explicitly capturing uncertainty and natively supporting error detection, while guiding generation toward semantically valid outputs with provable guarantees. We present preliminary results on a program generation task which demonstrate the potential of this approach to improve the semantic validity of LLM-generated artifacts. We also discuss how projectional decoding can enable verifiable automation with LLMs across various SE activities.