Compiling Agentic Workflows into LLM Weights: Near-Frontier Quality at Two Orders of Magnitude Less Cost

📅 2026-05-21

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the limitations of existing agent orchestration frameworks, which rely on external schedulers and incur substantial context overhead, require state-of-the-art large language models, and risk exposing proprietary workflows. To overcome these issues, the authors propose compiling multi-node agent workflows—comprising up to 55 nodes—directly into the weights of a small fine-tuned language model, thereby creating what they term “underground agents.” This approach provides the first systematic demonstration that complex workflows can be effectively internalized within model parameters. By integrating structured workflow representations, task-specific knowledge injection, and decision-hub modeling, the method achieves performance comparable to leading models on tasks such as travel booking, Zoom customer support, and insurance claims processing, while reducing inference costs by two orders of magnitude and substantially diminishing reliance on conventional orchestration frameworks.

📝 Abstract

Agent orchestration frameworks have proliferated, collectively exceeding 290,000 GitHub stars across LangGraph, CrewAI, Google ADK, OpenAI Agents SDK, Semantic Kernel, Strands, and LlamaIndex. All follow the same pattern: an external orchestrator above the LLM, injecting instructions and routing decisions every turn. Recent work has shown this architecture is dominated for procedural tasks by simply providing the procedure in a frontier model's system prompt [Dennis et al., 2026a], at the cost of consuming the context window, requiring a frontier model for every conversation, and exposing proprietary procedures to third-party providers. Compiling the procedure into the weights of a small fine-tuned model -- creating a subterranean agent -- should resolve all of these concerns, and prior work (SimpleTOD, FireAct, SynTOD, WorkflowLLM, Agent Lumos) has shown the technique works. Yet developer adoption has overwhelmingly favored orchestration. We identify three perceived barriers and address each empirically across travel booking (14 nodes), Zoom support (14 nodes, product-specific knowledge), and insurance claims (55 nodes, 6 decision hubs).

Problem

Research questions and friction points this paper is trying to address.

Agentic Workflows

LLM Compilation

Agent Orchestration

Fine-tuned Models

Procedural Tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

compiled agents

subterranean agent

workflow compilation