🤖 AI Summary
This work addresses the limited controllability and frequent task deviation of large language model (LLM)-based dialogue agents. To this end, we propose an SOP-guided Monte Carlo Tree Search (MCTS) planning framework. Our method explicitly models Standard Operating Procedures (SOPs) as structured dialogue control mechanisms, constructs the first multi-scenario SOP-annotated dialogue dataset—generated via semi-automatic role-playing (using GPT-4o) and rigorously validated by human annotators—and integrates chain-of-thought reasoning with supervised fine-tuning for SOP prediction, coupled with SOP-constrained online MCTS planning. Key contributions include: (1) the first formalization of SOPs as explicit, structured dialogue controllers; (2) the first publicly available SOP-annotated dialogue dataset spanning diverse scenarios; and (3) a novel planning framework that significantly improves task adherence. Experiments demonstrate a 27.95% absolute improvement in action accuracy over the GPT-3.5 baseline, markedly enhancing task focus and success rates for open-source LLMs. The code and dataset are publicly released.
📝 Abstract
Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their lack of controllability remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose ChatSOP, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.