π€ AI Summary
Users frequently pose unanswerable questions due to knowledge gaps; while large language models (LLMs) can detect such queries, they perform poorly at guiding users to reformulate them into answerable, information-focused versions (e.g., GPT-3.5 achieves only 23.03% rewriting accuracy). Method: We propose a zero-shot, structured question reformulation framework that integrates LLM-based semantic understanding with depth-first search (DFS) over predefined entity constraints, enabling controllable and interpretable question rewriting without fine-tuning or labeled data. Contribution/Results: Our approach improves rewriting accuracy to 70.42% on GPT-3.5 and 56.75% on Gemma2-9Bβsubstantially outperforming baselines. To our knowledge, this is the first work to incorporate symbolic search into LLM-driven question reformulation, jointly optimizing generation quality and reasoning traceability.
π Abstract
Question answering represents a core capability of large language models (LLMs). However, when individuals encounter unfamiliar knowledge in texts, they often formulate questions that the text itself cannot answer due to insufficient understanding of the underlying information. Recent studies reveal that while LLMs can detect unanswerable questions, they struggle to assist users in reformulating these questions. Even advanced models like GPT-3.5 demonstrate limited effectiveness in this regard. To address this limitation, we propose DRS: Deep Question Reformulation with Structured Output, a novel zero-shot method aimed at enhancing LLMs ability to assist users in reformulating questions to extract relevant information from new documents. DRS combines the strengths of LLMs with a DFS-based algorithm to iteratively explore potential entity combinations and constrain outputs using predefined entities. This structured approach significantly enhances the reformulation capabilities of LLMs. Comprehensive experimental evaluations demonstrate that DRS improves the reformulation accuracy of GPT-3.5 from 23.03% to 70.42%, while also enhancing the performance of open-source models, such as Gemma2-9B, from 26.35% to 56.75%.