🤖 AI Summary
Existing dialogue data synthesis methods primarily emphasize lexical, topical, or behavioral diversity, neglecting task-logical diversity at the dialogue level. Method: This paper proposes a novel synthesis framework centered on *task-execution logical diversity*: leveraging large language models (LLMs) with prompt engineering to generate structured, decision-tree–style task plans; these plans are then used to derive controllable, interpretable, and logically diverse dialogue flows, guiding the generation of multi-turn task-oriented dialogues. Contribution/Results: To our knowledge, this is the first work to elevate diversity modeling from the utterance or action level to the dialogue-level task-logic level. We construct a high-quality dataset comprising 3,886 dialogue flows spanning 15 domains. Empirically, models fine-tuned on our data achieve significantly superior performance over strong baselines—including GPT-4—on next-action prediction, demonstrating the effectiveness of task-logical diversity in enhancing model generalization and reasoning.
📝 Abstract
Developing language model-based dialogue agents requires effective data to train models that can follow specific task logic. However, most existing data simulation methods focus on increasing diversity in language, topics, or dialogue acts at the utterance level, largely neglecting a critical aspect of task logic diversity at the dialogue level. This paper proposes a novel data simulation method designed to enhance the diversity of synthetic dialogues by focusing on task execution logic. Our method uses LLMs to generate decision tree-structured task plans, which enables the derivation of diverse dialogue trajectories for a given task. Each trajectory, referred to as a"dialog flow", guides the generation of a multi-turn dialogue that follows a unique trajectory. We apply this method to generate a task-oriented dialogue dataset comprising 3,886 dialogue flows across 15 different domains. We validate the effectiveness of this dataset using the next action prediction task, where models fine-tuned on our dataset outperform strong baselines, including GPT-4. Upon acceptance of this paper, we plan to release the code and data publicly.