Evaluating and Enhancing Out-of-Domain Generalization of Task-Oriented Dialog Systems for Task Completion without Turn-level Dialog Annotations

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of improving task completion in unseen domains for task-oriented dialogue systems without per-turn state or policy annotations. We propose ZeroToD, a framework integrating schema-aware prompt augmentation, zero-shot prompt contrastive learning, and lightweight supervised fine-tuning—trained solely on natural-language dialogue data, without API call traces or slot annotations. Its core innovation is a schema-enhanced prompting mechanism that significantly improves API call accuracy and cross-domain generalization under zero-label conditions. Evaluated on open-source LLMs—including Llama, Phi, and Qwen—ZeroToD boosts task success rate on unseen domains from a baseline of 53% to substantially higher levels; even small fine-tuned models outperform large proprietary LLMs. Human evaluation confirms that generated responses are informative, fluent, and task-effective.

Technology Category

Application Category

📝 Abstract

Traditional task-oriented dialog (ToD) systems rely heavily on labor-intensive turn-level annotations, such as dialogue states and policy labels, for training. This work explores whether large language models (LLMs) can be fine-tuned solely on natural language dialogs to perform ToD tasks, without requiring such annotations. We evaluate their ability to generalize to unseen domains and compare their performance with models trained on fully annotated data. Through extensive experiments with three open-source LLMs of varying sizes and two diverse ToD datasets, we find that models fine-tuned without turn-level annotations generate coherent and contextually appropriate responses. However, their task completion performance - measured by accurate execution of API calls - remains suboptimal, with the best models achieving only around 53% success in unseen domains. To improve task completion, we propose ZeroToD, a framework that incorporates a schema augmentation mechanism to enhance API call accuracy and overall task completion rates, particularly in out-of-domain settings. We also compare ZeroToD with fine-tuning-free alternatives, such as prompting off-the-shelf LLMs, and find that our framework enables smaller, fine-tuned models that outperform large-scale proprietary LLMs in task completion. Additionally, a human study evaluating informativeness, fluency, and task completion confirms our empirical findings. These findings suggest the feasibility of developing cost-effective, scalable, and zero-shot generalizable ToD systems for real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Generalization of task-oriented dialog systems

Reducing dependency on turn-level annotations

Enhancing task completion accuracy in unseen domains

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs fine-tuned on natural dialogs

ZeroToD enhances API call accuracy

Outperforms large-scale proprietary LLMs

🔎 Similar Papers

No similar papers found.

Authors to Follow