🤖 AI Summary
Task-oriented dialogue systems face two key challenges: inaccurate API invocation and poor cross-domain generalization in multi-turn interactions. To address these, we propose Prompt Chaining—a zero-shot domain adaptation framework—augmented by a fine-grained, schema-guided feedback mechanism for real-time API validation and dynamic correction. Our method integrates instruction-tuned large language models, a two-stage prompt chaining strategy, structured API verification, and feedback-driven refinement. On the SGD and BiTOD benchmarks, our approach achieves API accuracy improvements of +37.74% over AutoTOD and +11.26% over SimpleTOD. Human evaluation confirms significant gains in task completion rate, response fluency, and information richness. The core contributions are (1) zero-shot domain transfer capability without task-specific fine-tuning, and (2) a verifiable, self-correcting API generation paradigm grounded in domain schemas and iterative feedback.
📝 Abstract
Task-oriented dialog (TOD) systems facilitate users in accomplishing complex, multi-turn tasks through natural language. While traditional approaches rely on extensive fine-tuning and annotated data for each domain, instruction-tuned large language models (LLMs) offer a more flexible alternative. However, LLMs struggle to reliably handle multi-turn task completion, particularly with accurately generating API calls and adapting to new domains without explicit demonstrations. To address these challenges, we propose RealTOD, a novel framework that enhances TOD systems through prompt chaining and fine-grained feedback mechanisms. Prompt chaining enables zero-shot domain adaptation via a two-stage prompting strategy, eliminating the need for human-curated demonstrations. Meanwhile, the fine-grained feedback mechanism improves task completion by verifying API calls against domain schemas and providing precise corrective feedback when errors are detected. We conduct extensive experiments on the SGD and BiTOD benchmarks using four LLMs. RealTOD improves API accuracy, surpassing AutoTOD by 37.74% on SGD and SimpleTOD by 11.26% on BiTOD. Human evaluations further confirm that LLMs integrated with RealTOD achieve superior task completion, fluency, and informativeness compared to existing methods.