Turning Conversations into Workflows: A Framework to Extract and Evaluate Dialog Workflows for Service AI Agents

๐Ÿ“… 2025-02-24
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Service workflows in customer service dialogues are often missing and unstructured, leading to inconsistent AI responses. Method: This paper proposes the first integrated framework for automatic dialogue workflow extraction and simulation-based evaluation. It combines retrieval-augmented generation (RAG) with a novel question-answeringโ€“based chain-of-thought (QA-CoT) prompting technique to improve structured workflow generation accuracy. Additionally, it introduces a scalable two-agent (agent + customer) simulation evaluation mechanism for automated, large-scale workflow assessment. Contribution/Results: We design a macro-accuracy metric that achieves high agreement with human evaluation (Spearman ฯ > 0.92). On the ABCD and SynthABCD datasets, our method improves average macro-accuracy by 12.16%, significantly mitigating response inconsistency caused by workflow omission in service AI systems.

Technology Category

Application Category

๐Ÿ“ Abstract
Automated service agents require well-structured workflows to provide consistent and accurate responses to customer queries. However, these workflows are often undocumented, and their automatic extraction from conversations remains unexplored. In this work, we present a novel framework for extracting and evaluating dialog workflows from historical interactions. Our extraction process consists of two key stages: (1) a retrieval step to select relevant conversations based on key procedural elements, and (2) a structured workflow generation process using a question-answer-based chain-of-thought (QA-CoT) prompting. To comprehensively assess the quality of extracted workflows, we introduce an automated agent and customer bots simulation framework that measures their effectiveness in resolving customer issues. Extensive experiments on the ABCD and SynthABCD datasets demonstrate that our QA-CoT technique improves workflow extraction by 12.16% in average macro accuracy over the baseline. Moreover, our evaluation method closely aligns with human assessments, providing a reliable and scalable framework for future research.
Problem

Research questions and friction points this paper is trying to address.

Extract dialog workflows from conversations
Evaluate workflow quality automatically
Improve accuracy using QA-CoT technique
Innovation

Methods, ideas, or system contributions that make the work stand out.

Retrieval of relevant conversations
QA-CoT for structured workflow
Automated simulation for evaluation
๐Ÿ”Ž Similar Papers
No similar papers found.
Prafulla Kumar Choubey
Prafulla Kumar Choubey
Salesforce AI Research
Natural Language ProcessingMachine Learning
X
Xiangyu Peng
Salesforce AI Research
S
Shilpa Bhagavath
Salesforce AI Research
Caiming Xiong
Caiming Xiong
Salesforce Research
Machine LearningNLPComputer VisionMultimediaData Mining
S
Shiva K. Pentyala
Salesforce AI Research
C
Chien-Sheng Wu
Salesforce AI Research