In-Context Prompting Obsoletes Agent Orchestration for Procedural Tasks

📅 2026-04-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

181K/year
🤖 AI Summary
This work proposes a novel approach to agent orchestration by embedding complete task logic directly into the system prompt, enabling large language models (LLMs) to self-orchestrate complex, multi-step workflows without external controllers. Evaluated across three real-world domains—travel booking, Zoom technical support, and insurance claims processing—with task graphs comprising 14 to 55 nodes, the method demonstrates significant advantages over prevailing external frameworks such as LangGraph. Using LLM-as-judge automated evaluation, in-context prompting achieves task scores of 4.53–5.00 on a 5-point scale and reduces failure rates by 12.5%, 8.5%, and 12% respectively. This study provides the first systematic evidence that state-of-the-art LLMs can autonomously and reliably execute structured, programmatic tasks at scale.
📝 Abstract
Agent orchestration frameworks -- LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, and others -- place an external orchestrator above the LLM, tracking state and injecting routing instructions at every turn. We present a controlled comparison showing that for procedural tasks, this architecture is dominated by a simpler alternative: putting the entire procedure in the system prompt and letting the model self-orchestrate. Across three domains -- travel booking (14 nodes), Zoom technical support (14 nodes), and insurance claims processing (55 nodes) -- we evaluate 200 conversations per condition using LLM-as-judge scoring on five quality criteria. The in-context approach scores 4.53--5.00 on a 5-point scale while a LangGraph orchestrator using the same model scores 4.17--4.84. The orchestrated system fails on 24% of travel, 9% of Zoom, and 17% of insurance conversations, compared to 11.5%, 0.5%, and 5% for the in-context baseline. While external orchestration may have been necessary for earlier models, advances in frontier model capabilities have made it unnecessary for multi-turn conversations following a defined procedure.
Problem

Research questions and friction points this paper is trying to address.

agent orchestration
in-context prompting
procedural tasks
LLM
system prompt
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-context prompting
agent orchestration
self-orchestration
procedural tasks
LLM-as-judge