Verbal Process Supervision Elicits Better Coding Agents

📅 2025-03-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) achieve strong performance on code generation benchmarks but remain limited in complex software engineering tasks—such as multi-file debugging, requirement comprehension, and system-level refactoring—due to insufficient reasoning depth and lack of process controllability. To address these limitations, we propose CURA, the first coding agent framework integrating Verbal Process Supervision (VPS): a structured mechanism that explicitly models reasoning steps, dynamically calibrates intermediate states, and tightly couples multi-step code understanding with test-time process feedback. Built upon reasoning-optimized models (e.g., o3-mini), CURA achieves a 3.65% absolute improvement over strong baselines on high-difficulty benchmarks including BigCodeBench, establishing a new state-of-the-art. Its core contribution lies in shifting LLM-based coding agents from opaque, generative paradigms toward transparent, interpretable, and human-intervenable reasoning-centric frameworks.

Technology Category

Application Category

📝 Abstract
The emergence of large language models and their applications as AI agents have significantly advanced state-of-the-art code generation benchmarks, transforming modern software engineering tasks. However, even with test-time computed reasoning models, these systems still struggle with complex software engineering challenges. This work introduces CURA, a code understanding and reasoning agent system enhanced with verbal process supervision (VPS), achieving a 3.65% improvement over baseline models on challenging benchmarks like BigCodeBench. Furthermore, CURA, when paired with the o3-mini model and VPS techniques, attains state-of-the-art performance. This work represents a step forward in integrating reasoning-driven architectures with LLM-based code generation, enabling agentic reasoning for language models to solve complex software engineering tasks.
Problem

Research questions and friction points this paper is trying to address.

Enhancing code generation with verbal process supervision
Improving AI agents for complex software engineering tasks
Integrating reasoning-driven architectures with LLM-based coding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Verbal process supervision enhances coding agents
CURA integrates reasoning-driven architectures with LLMs
VPS techniques achieve state-of-the-art performance
🔎 Similar Papers
No similar papers found.
Hao-Yuan Chen
Hao-Yuan Chen
University of London, Mindify AI
Quantum Machine LearningQuantum UtilityLLM ReasoningLLM Agent
C
Cheng-Pong Huang
National Taiwan University of Science and Technology, Taiwan
J
Jui-Ming Yao
National Taiwan University of Science and Technology, Taiwan