MeDxAgent: Multi-Agent Consultation for Interactive Medical Diagnosis

๐Ÿ“… 2026-06-02
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

191K/year
๐Ÿค– AI Summary
This study addresses the limitation of existing large language models in medical diagnosis, which typically assume complete information and fail to emulate the iterative, interactive reasoning process characteristic of real-world clinical practice. To bridge this gap, the authors introduce MeDxBench, a new evaluation benchmark comprising 4,421 clinical cases, and propose MeDxAgentโ€”the first multi-agent system designed specifically for interactive medical diagnosis that simulates physician consultation dynamics. MeDxAgent innovatively integrates demographic data collection, dialogue summarization for context transfer, and hypothesis-driven questioning guided by candidate diagnoses, thereby establishing a structured collaborative strategy. Experimental results demonstrate that MeDxAgent achieves a 10.3% absolute improvement in diagnostic accuracy over baseline methods on MeDxBench and reduces the performance gap with an ideal full-information model by 52.3%.
๐Ÿ“ Abstract
Large language models (LLMs) are increasingly used for health-related decision support. Yet most evaluations treat diagnosis as a single-shot task with complete information provided upfront, often as a multiple-choice selection. This diverges from clinical practice, where diagnosis is interactive and open-ended, involving sequential hypothesis refinement through targeted questioning. We address this gap. We build MeDxBench, a large-scale benchmark of 4,421 clinical cases across 20 specialties. We further propose MeDxAgent, a multi-agent consultation system for interactive diagnosis, and systematically study its prompt-, flow- and agent-level design choices. MeDxAgent achieves a 10.3% accuracy gain over the baseline on MeDxBench, closing 52.3% of the gap to a full-information oracle. We find that specific design choices: collecting demographics first, passing summarized dialogue for diagnosis, and feeding candidate diagnoses for targeted questioning, improve accuracy, mirroring how physicians reason, though their effect emerges fully only in combination. Code and dataset will be released upon publication.
Problem

Research questions and friction points this paper is trying to address.

interactive diagnosis
medical decision support
clinical reasoning
large language models
diagnostic benchmark
Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-agent consultation
interactive medical diagnosis
MeDxBench
large language models
clinical reasoning
๐Ÿ”Ž Similar Papers
No similar papers found.