ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents

📅 2024-07-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited controllability and frequent task deviation of large language model (LLM)-based dialogue agents. To this end, we propose an SOP-guided Monte Carlo Tree Search (MCTS) planning framework. Our method explicitly models Standard Operating Procedures (SOPs) as structured dialogue control mechanisms, constructs the first multi-scenario SOP-annotated dialogue dataset—generated via semi-automatic role-playing (using GPT-4o) and rigorously validated by human annotators—and integrates chain-of-thought reasoning with supervised fine-tuning for SOP prediction, coupled with SOP-constrained online MCTS planning. Key contributions include: (1) the first formalization of SOPs as explicit, structured dialogue controllers; (2) the first publicly available SOP-annotated dialogue dataset spanning diverse scenarios; and (3) a novel planning framework that significantly improves task adherence. Experiments demonstrate a 27.95% absolute improvement in action accuracy over the GPT-3.5 baseline, markedly enhancing task focus and success rates for open-source LLMs. The code and dataset are publicly released.

Technology Category

Application Category

📝 Abstract
Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their lack of controllability remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose ChatSOP, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.
Problem

Research questions and friction points this paper is trying to address.

Enhance controllability in LLM dialogue agents
Regulate dialogue flow using SOP guidance
Improve action accuracy in multi-scenario dialogues
Innovation

Methods, ideas, or system contributions that make the work stand out.

SOP-guided MCTS planning framework
Chain of Thought reasoning integration
SOP-annotated multi-scenario dataset
🔎 Similar Papers
No similar papers found.
Z
Zhigen Li
College of Intelligence and Computing, Tianjin University, Tianjin, China
Jianxiang Peng
Jianxiang Peng
Tianjin University
NLP
Y
Yanmeng Wang
Ping An Technology
T
Tianhao Shen
College of Intelligence and Computing, Tianjin University, Tianjin, China
M
Minghui Zhang
College of Intelligence and Computing, Tianjin University, Tianjin, China
L
Linxi Su
College of Intelligence and Computing, Tianjin University, Tianjin, China
Shang Wu
Shang Wu
Unknown affiliation
Y
Yihang Wu
College of Intelligence and Computing, Tianjin University, Tianjin, China
Y
Yuqian Wang
College of Intelligence and Computing, Tianjin University, Tianjin, China
Y
Ye Wang
Ping An Technology
W
Wei Hu
Ping An Technology
J
Jianfeng Li
Ping An Technology
Shaojun Wang
Shaojun Wang
Soochow University, TU/e, University of Strasbourg
NanophotonicsLight-matter interactionsNanofabrication
J
Jing Xiao
Ping An Technology
Deyi Xiong
Deyi Xiong
Professor, College of Intelligence and Computing, Tianjin University, China
Natural Language ProcessingLarge Language ModelsAI4Science