🤖 AI Summary
Ambiguous user queries pose a fundamental challenge in information retrieval, and existing conversational systems suffer from low clarification efficiency due to suboptimal question-asking strategies. To address this, we propose SherlockLLM—a reinforcement learning–based, annotation-free framework that employs an agent to generate binary question-answer sequences for active intent elicitation. Our method unifies structured tasks (e.g., attribute filtering) and unstructured tasks (e.g., open-ended description), and introduces a comprehensive, task-agnostic evaluation benchmark. Experiments demonstrate that SherlockLLM achieves near-optimal performance on structured tasks—approaching the theoretical lower bound of binary search—while substantially outperforming strong baselines on unstructured tasks. Crucially, our work constitutes the first approach to learn effective, generalizable intent clarification strategies without requiring large-scale labeled data, and it is applicable across diverse retrieval scenarios.
📝 Abstract
User queries in information retrieval are often ambiguous, making it challenging for systems to identify a user's target from a single query. While recent dialogue-based interactive retrieval systems can clarify user intent, they are inefficient as they often lack an explicit strategy to ask the most informative questions. To address this limitation, we propose SherlockLLM, a dialogue-driven retrieval framework that learns an optimal questioning strategy via Reinforcement Learning (RL) and avoids the need for large-scale annotated dialogue data. In our framework, an agent is trained to generate a sequence of binary questions to efficiently narrow down the search space. To validate our approach, we introduce a benchmark with both structured and unstructured tasks. Experimental results show that SherlockLLM is a robust and efficient solution. On the structured tasks, its performance matches strong baselines and approaches the theoretical optimal defined by binary search. On the challenging unstructured task, our agent significantly outperforms these baselines, showcasing its ability to learn a highly effective information-seeking dialogue policy.