🤖 AI Summary
Existing query-focused summarization (QFS) methods struggle to comprehensively represent diverse perspectives on controversial queries (e.g., “Is law school worth it?”), while large language model (LLM)-generated summaries often suffer from viewpoint imbalance and lack of provenance.
Method: We propose the first QFS framework leveraging multi-LLM simulation of a human debate panel: each document is treated as an independent “speaker,” and a “moderator” LLM dynamically generates customized retrieval queries to construct a source-attributed, structured outline—ensuring viewpoint balance and full document coverage. Our approach integrates multi-LLM collaboration, citation-aware content planning, and debate-informed summarization.
Results: On ConflictingQA and our newly curated DebateQFS benchmark, our method improves topic-paragraph coverage and perspective balance by 38–59% over SOTA. User studies confirm significantly higher readability and perceived fairness.
📝 Abstract
Query-focused summarization (QFS) gives a summary of documents to answer a query. Past QFS work assumes queries have one answer, ignoring debatable ones (Is law school worth it?). We introduce Debatable QFS (DQFS), a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must comprehensively cover all sources and balance perspectives, favoring no side. These goals elude LLM QFS systems, which: 1) lack structured content plans, failing to guide LLMs to write balanced summaries, and 2) use the same query to retrieve contexts across documents, failing to cover all perspectives specific to each document's content. To overcome this, we design MODS, a multi-LLM framework mirroring human panel discussions. MODS treats documents as individual Speaker LLMs and has a Moderator LLM that picks speakers to respond to tailored queries for planned topics. Speakers use tailored queries to retrieve relevant contexts from their documents and supply perspectives, which are tracked in a rich outline, yielding a content plan to guide the final summary. Experiments on ConflictingQA with controversial web queries and DebateQFS, our new dataset of debate queries from Debatepedia, show MODS beats SOTA by 38-59% in topic paragraph coverage and balance, based on new citation metrics. Users also find MODS's summaries to be readable and more balanced.