QUARTZ : QA-based Unsupervised Abstractive Refinement for Task-oriented Dialogue Summarization

📅 2025-09-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Dialogue summarization faces challenges including high supervision costs and weak task relevance—particularly limiting in high-stakes domains such as healthcare. To address this, we propose an unsupervised, question-answering (QA)-driven framework that eliminates the need for human-annotated summaries. First, a large language model generates dialogue summaries and corresponding task-oriented QA pairs in a zero-shot manner. Second, a QA consistency scoring mechanism automatically evaluates and filters high-quality summaries. Finally, the summarization model is fine-tuned on the selected high-scoring instances. Our approach significantly improves both information completeness and task relevance. Empirical evaluation across multiple benchmarks demonstrates performance on par with fully supervised state-of-the-art methods, while substantially outperforming existing zero-shot baselines. These results validate the method’s effectiveness, generalizability, and practical deployability in real-world settings.

Technology Category

Application Category

📝 Abstract
Dialogue summarization aims to distill the core meaning of a conversation into a concise text. This is crucial for reducing the complexity and noise inherent in dialogue-heavy applications. While recent approaches typically train language models to mimic human-written summaries, such supervision is costly and often results in outputs that lack task-specific focus limiting their effectiveness in downstream applications, such as medical tasks. In this paper, we propose app, a framework for task-oriented utility-based dialogue summarization. app starts by generating multiple summaries and task-oriented question-answer pairs from a dialogue in a zero-shot manner using a pool of large language models (LLMs). The quality of the generated summaries is evaluated by having LLMs answer task-related questions before extit{(i)} selecting the best candidate answers and extit{(ii)} identifying the most informative summary based on these answers. Finally, we fine-tune the best LLM on the selected summaries. When validated on multiple datasets, app demonstrates its effectiveness by achieving competitive results in various zero-shot settings, rivaling fully-supervised State-of-the-Art (SotA) methods.
Problem

Research questions and friction points this paper is trying to address.

Unsupervised abstractive refinement for task-oriented dialogue summarization
Reducing dialogue complexity and noise in downstream applications
Generating task-focused summaries without costly human supervision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Generates summaries and QA pairs using LLMs
Selects best summary via task-oriented QA evaluation
Fine-tunes LLM on selected summaries for optimization
🔎 Similar Papers
No similar papers found.
M
Mohamed Imed Eddine Ghebriout
Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
G
Gaël Guibon
Université de Lorraine, CNRS, Inria, LORIA, F-54000 Nancy, France
I
Ivan Lerner
Inserm, Centre de Recherche des Cordeliers, Université Paris Cité, Sorbonne Université, F-75006 Paris, France
Emmanuel Vincent
Emmanuel Vincent
Senior Research Scientist, Inria
speech & audio