Routine: A Structural Planning Framework for LLM Agent System in Enterprise

📅 2025-07-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Enterprise agent systems frequently suffer from disorganized planning, omitted tool invocations, and unstable execution due to insufficient domain-specific process knowledge. To address this, we propose Routine, a structured multi-step planning framework that systematically integrates domain tool usage patterns via explicit instruction encoding, parameterized context propagation, and reusable process modeling. We further design a Routine-following data distillation method to construct a high-quality, multi-step tool invocation dataset, and fine-tune GPT-4o and Qwen3-14B on it. Experiments show that tool invocation accuracy improves from 41.1% to 96.3% for GPT-4o and from 32.6% to 83.3% for Qwen3-14B; the fine-tuned model achieves 88.2%, while the distilled model reaches 95.5%, closely matching GPT-4o’s performance. This work is the first to introduce structured process modeling coupled with instruction–parameter co-design into agent planning, significantly enhancing cross-scenario generalization and execution robustness.

Technology Category

Application Category

📝 Abstract

The deployment of agent systems in an enterprise environment is often hindered by several challenges: common models lack domain-specific process knowledge, leading to disorganized plans, missing key tools, and poor execution stability. To address this, this paper introduces Routine, a multi-step agent planning framework designed with a clear structure, explicit instructions, and seamless parameter passing to guide the agent's execution module in performing multi-step tool-calling tasks with high stability. In evaluations conducted within a real-world enterprise scenario, Routine significantly increases the execution accuracy in model tool calls, increasing the performance of GPT-4o from 41.1% to 96.3%, and Qwen3-14B from 32.6% to 83.3%. We further constructed a Routine-following training dataset and fine-tuned Qwen3-14B, resulting in an accuracy increase to 88.2% on scenario-specific evaluations, indicating improved adherence to execution plans. In addition, we employed Routine-based distillation to create a scenario-specific, multi-step tool-calling dataset. Fine-tuning on this distilled dataset raised the model's accuracy to 95.5%, approaching GPT-4o's performance. These results highlight Routine's effectiveness in distilling domain-specific tool-usage patterns and enhancing model adaptability to new scenarios. Our experimental results demonstrate that Routine provides a practical and accessible approach to building stable agent workflows, accelerating the deployment and adoption of agent systems in enterprise environments, and advancing the technical vision of AI for Process.

Problem

Research questions and friction points this paper is trying to address.

Lack of domain-specific process knowledge in common models

Disorganized plans and poor execution stability in agent systems

Low accuracy in multi-step tool-calling tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-step agent planning framework

Routine-following training dataset

Routine-based distillation dataset

🔎 Similar Papers

No similar papers found.

Authors to Follow