🤖 AI Summary
This work addresses the sensitivity of large language models (LLMs) to minor prompt variations in clinical prediction and the inability of existing single-agent or fixed-role multi-agent approaches to effectively harness diagnostic signals from expert disagreement. To this end, the authors propose CAMP, a framework wherein an attending physician agent dynamically assembles a panel of specialist agents based on case uncertainty. Specialists express opinions via ternary voting (support/oppose/abstain), and a hybrid routing mechanism enables flexible decision-making among consensus, attending judgment, and arbitration based on argument quality rather than vote count. CAMP introduces the first case-adaptive, dynamic multi-agent deliberation mechanism, incorporates abstention to handle out-of-domain queries, and enhances both accuracy and transparency through argument-quality-based arbitration. Evaluated on MIMIC-IV, CAMP consistently outperforms strong baselines across four LLM backbones while using fewer inference tokens and producing auditable decision traces.
📝 Abstract
Large language models applied to clinical prediction exhibit case-level heterogeneity: simple cases yield consistent outputs, while complex cases produce divergent predictions under minor prompt changes. Existing single-agent strategies sample from one role-conditioned distribution, and multi-agent frameworks use fixed roles with flat majority voting, discarding the diagnostic signal in disagreement. We propose CAMP (Case-Adaptive Multi-agent Panel), where an attending-physician agent dynamically assembles a specialist panel tailored to each case's diagnostic uncertainty. Each specialist evaluates candidates via three-valued voting (KEEP/REFUSE/NEUTRAL), enabling principled abstention outside one's expertise. A hybrid router directs each diagnosis through strong consensus, fallback to the attending physician's judgment, or evidence-based arbitration that weighs argument quality over vote counts. On diagnostic prediction and brief hospital course generation from MIMIC-IV across four LLM backbones, CAMP consistently outperforms strong baselines while consuming fewer tokens than most competing multi-agent methods, with voting records and arbitration traces offering transparent decision audits.