Routine: A Structural Planning Framework for LLM Agent System in Enterprise

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Enterprise agent systems frequently suffer from disorganized planning, omitted tool invocations, and unstable execution due to insufficient domain-specific process knowledge. To address this, we propose Routine, a structured multi-step planning framework that systematically integrates domain tool usage patterns via explicit instruction encoding, parameterized context propagation, and reusable process modeling. We further design a Routine-following data distillation method to construct a high-quality, multi-step tool invocation dataset, and fine-tune GPT-4o and Qwen3-14B on it. Experiments show that tool invocation accuracy improves from 41.1% to 96.3% for GPT-4o and from 32.6% to 83.3% for Qwen3-14B; the fine-tuned model achieves 88.2%, while the distilled model reaches 95.5%, closely matching GPT-4o’s performance. This work is the first to introduce structured process modeling coupled with instruction–parameter co-design into agent planning, significantly enhancing cross-scenario generalization and execution robustness.

Technology Category

Application Category

📝 Abstract
The deployment of agent systems in an enterprise environment is often hindered by several challenges: common models lack domain-specific process knowledge, leading to disorganized plans, missing key tools, and poor execution stability. To address this, this paper introduces Routine, a multi-step agent planning framework designed with a clear structure, explicit instructions, and seamless parameter passing to guide the agent's execution module in performing multi-step tool-calling tasks with high stability. In evaluations conducted within a real-world enterprise scenario, Routine significantly increases the execution accuracy in model tool calls, increasing the performance of GPT-4o from 41.1% to 96.3%, and Qwen3-14B from 32.6% to 83.3%. We further constructed a Routine-following training dataset and fine-tuned Qwen3-14B, resulting in an accuracy increase to 88.2% on scenario-specific evaluations, indicating improved adherence to execution plans. In addition, we employed Routine-based distillation to create a scenario-specific, multi-step tool-calling dataset. Fine-tuning on this distilled dataset raised the model's accuracy to 95.5%, approaching GPT-4o's performance. These results highlight Routine's effectiveness in distilling domain-specific tool-usage patterns and enhancing model adaptability to new scenarios. Our experimental results demonstrate that Routine provides a practical and accessible approach to building stable agent workflows, accelerating the deployment and adoption of agent systems in enterprise environments, and advancing the technical vision of AI for Process.
Problem

Research questions and friction points this paper is trying to address.

Lack of domain-specific process knowledge in common models
Disorganized plans and poor execution stability in agent systems
Low accuracy in multi-step tool-calling tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-step agent planning framework
Routine-following training dataset
Routine-based distillation dataset
🔎 Similar Papers
No similar papers found.
G
Guancheng Zeng
Digital China AI Research
Xueyi Chen
Xueyi Chen
Master of Science,The Chinsese University of Hong Kong
MLLMAgent-driven Reasoning
J
Jiawang Hu
Digital China AI Research
S
Shaohua Qi
Digital China AI Research
Y
Yaxuan Mao
Digital China AI Research
Z
Zhantao Wang
Digital China AI Research
Y
Yifan Nie
Digital China AI Research
S
Shuang Li
Digital China AI Research
Q
Qiuyang Feng
Digital China AI Research
P
Pengxu Qiu
Digital China AI Research
Y
Yujia Wang
Digital China AI Research
W
Wenqiang Han
Digital China AI Research
Linyan Huang
Linyan Huang
Digital China AI Research
G
Gang Li
Digital China AI Research
J
Jingjing Mo
Digital China AI Research
H
Haowen Hu
Digital China AI Research