A Super-Learner with Large Language Models for Medical Emergency Advising

📅 2025-11-05

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

Large language models (LLMs) exhibit insufficient diagnostic accuracy for acute conditions in medical emergency consultations. Method: We propose MEDAS, a meta-learning–based hyper-learning framework that dynamically fuses diagnostic outputs from five state-of-the-art LLMs—Gemini, Llama, Grok, GPT, and Claude—modeling inter-model knowledge complementarity and task-specific adaptability to enable real-time, collaborative decision support in emergency settings. Contribution/Results: Experimental evaluation shows the ensemble achieves 70% diagnostic accuracy—lower than the best individual model (85%) but significantly surpassing the average clinician performance (~62%). Crucially, MEDAS is the first framework to empirically validate a robust collective knowledge gain mechanism across multiple LLMs under high-uncertainty emergency tasks. It establishes an interpretable, scalable, and trustworthy multi-LLM collaboration paradigm for clinical AI, advancing reliability and transparency in safety-critical healthcare applications.

Technology Category

Application Category

📝 Abstract

Medical decision-support and advising systems are critical for emergency physicians to quickly and accurately assess patients'conditions and make diagnosis. Artificial Intelligence (AI) has emerged as a transformative force in healthcare in recent years and Large Language Models (LLMs) have been employed in various fields of medical decision-support systems. We studied responses of a group of different LLMs to real cases in emergency medicine. The results of our study on five most renown LLMs showed significant differences in capabilities of Large Language Models for diagnostics acute diseases in medical emergencies with accuracy ranging between 58% and 65%. This accuracy significantly exceeds the reported accuracy of human doctors. We built a super-learner MEDAS (Medical Emergency Diagnostic Advising System) of five major LLMs - Gemini, Llama, Grok, GPT, and Claude). The super-learner produces higher diagnostic accuracy, 70%, even with a quite basic meta-learner. However, at least one of the integrated LLMs in the same super-learner produces 85% correct diagnoses. The super-learner integrates a cluster of LLMs using a meta-learner capable of learning different capabilities of each LLM to leverage diagnostic accuracy of the model by collective capabilities of all LLMs in the cluster. The results of our study showed that aggregated diagnostic accuracy provided by a meta-learning approach exceeds that of any individual LLM, suggesting that the super-learner can take advantage of the combined knowledge of the medical datasets used to train the group of LLMs.

Problem

Research questions and friction points this paper is trying to address.

Improving diagnostic accuracy of large language models for acute diseases in emergency medicine

Addressing significant capability differences among individual LLMs in medical diagnostics

Leveraging collective knowledge of multiple LLMs through meta-learning for better emergency advising

Innovation

Methods, ideas, or system contributions that make the work stand out.

Super-learner integrates five different LLMs for medical diagnosis

Meta-learner combines individual LLM capabilities to boost accuracy

System leverages collective knowledge from multiple medical datasets

🔎 Similar Papers

No similar papers found.