One Panel Does Not Fit All: Case-Adaptive Multi-Agent Deliberation for Clinical Prediction

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the sensitivity of large language models (LLMs) to minor prompt variations in clinical prediction and the inability of existing single-agent or fixed-role multi-agent approaches to effectively harness diagnostic signals from expert disagreement. To this end, the authors propose CAMP, a framework wherein an attending physician agent dynamically assembles a panel of specialist agents based on case uncertainty. Specialists express opinions via ternary voting (support/oppose/abstain), and a hybrid routing mechanism enables flexible decision-making among consensus, attending judgment, and arbitration based on argument quality rather than vote count. CAMP introduces the first case-adaptive, dynamic multi-agent deliberation mechanism, incorporates abstention to handle out-of-domain queries, and enhances both accuracy and transparency through argument-quality-based arbitration. Evaluated on MIMIC-IV, CAMP consistently outperforms strong baselines across four LLM backbones while using fewer inference tokens and producing auditable decision traces.
📝 Abstract
Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from one role-conditioned distribution, and multi-agent frameworks use fixed roles with flat majority voting, discarding the diagnostic signal in disagreement. We propose CAMP (Case-Adaptive Multi-agent Panel), where an attending-physician agent dynamically assembles a specialist panel tailored to each case's diagnostic uncertainty. Each specialist evaluates candidates via three-valued voting (KEEP/REFUSE/NEUTRAL), enabling principled abstention outside one's expertise. A hybrid router directs each diagnosis through strong consensus, fallback to the attending physician's judgment, or evidence-based arbitration that weighs argument quality over vote counts. On diagnostic prediction and brief hospital course generation from MIMIC-IV across four LLM backbones, CAMP consistently outperforms strong baselines while consuming fewer tokens than most competing multi-agent methods, with voting records and arbitration traces offering transparent decision audits.
Problem

Research questions and friction points this paper is trying to address.

clinical prediction
case-level heterogeneity
multi-agent deliberation
diagnostic uncertainty
large language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

case-adaptive
multi-agent deliberation
three-valued voting
hybrid routing
clinical prediction