LLM-as-an-Investigator: Evidence-First Reasoning for Robust Interactive Problem Diagnosis

📅 2026-06-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the susceptibility of large language models to user-provided incomplete or erroneous assumptions in interactive technical diagnosis, which often leads to confirmation bias. To mitigate this, the authors propose an evidence-first multi-agent framework featuring an “investigator” agent that evaluates question ambiguity, iteratively updates hypothesis probabilities via Bayesian inference, and actively poses clarifying questions—thereby prioritizing evidence-driven reasoning over assumption accommodation. The system integrates a problem-solution extractor, a ground-truth evaluator, and the investigator agent within a structured dialogue framework. Evaluated on technical forums spanning mechanical, electrical, and hydraulic domains, the approach significantly outperforms both direct prompting and pure reasoning baselines, effectively curbing misleading hypothesis adherence and enhancing diagnostic accuracy and robustness.

📝 Abstract

Large language models (LLMs) are increasingly used as interactive assistants for technical problem solving. However, when users provide incomplete descriptions or plausible but unverified explanations, LLMs may prematurely align with these assumptions and propose solutions before collecting sufficient evidence. We refer to this behavior as user-driven sycophancy: the tendency of an LLM to reinforce a user-provided hypothesis instead of testing alternative explanations. This paper introduces LLM-as-an-Investigator, an evidence-first agentic AI methodology for robust problem diagnosis. The approach is implemented through a Solution Investigator Agent, which estimates the ambiguity of an initial problem description, generates candidate hypotheses, asks targeted clarification questions, and updates hypothesis probabilities after each answer. Rather than producing an immediate response, the agent continues the investigation until the evidence makes one candidate explanation stronger than the alternatives. To evaluate the approach, we build a benchmark from solved technical forum threads in mechanical, electrical, and hydraulic domains. We use a three-agent evaluation pipeline in which a Problem-Solution Extractor Agent converts solved threads into structured cases, a Ground-Truth Evaluator Agent simulates the user while hiding the known solution, and the tested assistant attempts to recover the solution through dialogue. The experiments compare standard assistants, reasoning-oriented LLMs, and the proposed investigator-based model across LLM backbones. In addition to diagnostic accuracy, we analyze how standard assistants follow misleading user hypotheses in diagnostic cases. The results show that the proposed approach identifies the problem more accurately than direct prompting and reasoning-only baselines, while its evidence-first protocol helps reduce user-induced conversational bias.

Problem

Research questions and friction points this paper is trying to address.

user-driven sycophancy

problem diagnosis

evidence collection

conversational bias

hypothesis validation

Innovation

Methods, ideas, or system contributions that make the work stand out.

evidence-first reasoning

LLM-as-an-Investigator

user-driven sycophancy