LLM-Driven Multi-Turn Task-Oriented Dialogue Synthesis for Realistic Reasoning

📅 2026-02-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing benchmark datasets struggle to effectively evaluate the logical reasoning capabilities of large language models due to excessive simplification, lack of real-world relevance, contamination from pretraining data, and high annotation costs. To address these limitations, this work proposes a large language model–driven framework for synthesizing multi-turn task-oriented dialogues. The framework employs a three-stage optimization pipeline—task formulation, dialogue generation, and task refinement—combined with adversarial filtering to automatically produce contextually coherent, realistic, and logically challenging dialogues. The resulting high-quality evaluation benchmark ensures data purity and task authenticity, significantly enhancing the validity of reasoning assessment in real-world scenarios and improving model training efficacy.

Technology Category

Application Category

📝 Abstract
The reasoning capability of large language models (LLMs), defined as their ability to analyze, infer, and make decisions based on input information, is essential for building intelligent task-oriented dialogue systems. However, existing benchmarks do not sufficiently reflect the complexity of real-world scenarios, which limits their effectiveness in evaluating and enhancing LLM reasoning in practical contexts. Many current reasoning datasets are overly simplistic and abstract, often disconnected from realistic task flows, domain constraints, and operational rules, making it difficult to effectively evaluate LLMs'logical reasoning ability. In addition, data contamination from pretraining corpora undermines the reliability of evaluation results, and traditional crowdsourcing methods for dataset construction are labor-intensive and difficult to scale. To address these challenges, we propose a LLM-driven framework for synthesizing multi-turn, task-oriented dialogues grounded in realistic reasoning scenarios, leveraging trilevel optimization to enhance dialogue quality. Our method generates dialogues grounded in authentic task scenarios, enriched with real-world information, and exhibiting strong contextual coherence. Corresponding reasoning tasks are carefully designed around these dialogues and iteratively refined to continuously improve the tasks'quality and challenge. The resulting dataset serves as a valuable benchmark for assessing and advancing the realistic logical reasoning capabilities of LLMs. Experimental results show that our synthetic data-based reasoning tasks introduce non-trivial reasoning challenges and provide meaningful support for improving the reasoning capabilities of LLMs.
Problem

Research questions and friction points this paper is trying to address.

reasoning capability
task-oriented dialogue
realistic scenarios
dataset contamination
benchmark evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-driven dialogue synthesis
realistic reasoning
multi-turn task-oriented dialogue
trilevel optimization
reasoning benchmark
🔎 Similar Papers
No similar papers found.