Collaboration among Multiple Large Language Models for Medical Question Answering

📅 2025-05-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing medical question-answering (QA) research lacks systematic investigation into collaborative expertise integration and complementary reasoning mechanisms among multiple large language models (LLMs). Method: This paper proposes the first posterior multi-LLM collaboration framework for medical multiple-choice QA, comprising three components: (1) posterior analysis using three pre-trained LLMs; (2) consensus-driven collaborative reasoning; and (3) confidence-aware dynamic evaluation. Contribution/Results: It innovatively achieves inter-model confidence alignment and disagreement suppression in medical QA, substantially improving individual model accuracy. Empirical analysis demonstrates a strong positive correlation between model confidence and prediction accuracy—establishing both theoretical grounding and a practical paradigm for trustworthy multi-LLM collaboration. The framework advances robust, interpretable, and reliable medical QA through principled ensemble reasoning grounded in calibrated confidence estimation.

Technology Category

Application Category

📝 Abstract
Empowered by vast internal knowledge reservoir, the new generation of large language models (LLMs) demonstrate untapped potential to tackle medical tasks. However, there is insufficient effort made towards summoning up a synergic effect from multiple LLMs' expertise and background. In this study, we propose a multi-LLM collaboration framework tailored on a medical multiple-choice questions dataset. Through post-hoc analysis on 3 pre-trained LLM participants, our framework is proved to boost all LLMs reasoning ability as well as alleviate their divergence among questions. We also measure an LLM's confidence when it confronts with adversary opinions from other LLMs and observe a concurrence between LLM's confidence and prediction accuracy.
Problem

Research questions and friction points this paper is trying to address.

Enhancing medical question answering via multi-LLM collaboration
Reducing divergence among LLMs in medical reasoning tasks
Measuring LLM confidence against adversarial peer opinions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-LLM collaboration framework for medical QA
Post-hoc analysis boosts reasoning and reduces divergence
Confidence measurement aligns with prediction accuracy
🔎 Similar Papers
No similar papers found.