QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

147K/year

🤖 AI Summary

Dialogue summarization faces challenges including high supervision costs and weak task relevance—particularly limiting in high-stakes domains such as healthcare. To address this, we propose an unsupervised, question-answering (QA)-driven framework that eliminates the need for human-annotated summaries. First, a large language model generates dialogue summaries and corresponding task-oriented QA pairs in a zero-shot manner. Second, a QA consistency scoring mechanism automatically evaluates and filters high-quality summaries. Finally, the summarization model is fine-tuned on the selected high-scoring instances. Our approach significantly improves both information completeness and task relevance. Empirical evaluation across multiple benchmarks demonstrates performance on par with fully supervised state-of-the-art methods, while substantially outperforming existing zero-shot baselines. These results validate the method’s effectiveness, generalizability, and practical deployability in real-world settings.

Technology Category

Application Category

📝 Abstract

Dialogue summarization aims to distill the core meaning of a conversation into a concise text. This is crucial for reducing the complexity and noise inherent in dialogue-heavy applications. While recent approaches typically train language models to mimic human-written summaries, such supervision is costly and often results in outputs that lack task-specific focus limiting their effectiveness in downstream applications, such as medical tasks. In this paper, we propose app, a framework for task-oriented utility-based dialogue summarization. app starts by generating multiple summaries and task-oriented question-answer pairs from a dialogue in a zero-shot manner using a pool of large language models (LLMs). The quality of the generated summaries is evaluated by having LLMs answer task-related questions before extit{(i)} selecting the best candidate answers and extit{(ii)} identifying the most informative summary based on these answers. Finally, we fine-tune the best LLM on the selected summaries. When validated on multiple datasets, app demonstrates its effectiveness by achieving competitive results in various zero-shot settings, rivaling fully-supervised State-of-the-Art (SotA) methods.

Problem

Research questions and friction points this paper is trying to address.

Unsupervised abstractive refinement for task-oriented dialogue summarization

Reducing dialogue complexity and noise in downstream applications

Generating task-focused summaries without costly human supervision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates summaries and QA pairs using LLMs

Selects best summary via task-oriented QA evaluation

Fine-tunes LLM on selected summaries for optimization

🔎 Similar Papers

No similar papers found.