XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

📅 2024-05-10
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Limited-shot clinical decision support by large language models (LLMs) hampers reliability in medical diagnosis due to scarce labeled patient data. Method: We propose the first explainable AI (XAI)-enhanced collaborative framework integrating clinical machine learning (ML) models and LLMs, featuring multi-layered structured prompting to inject domain knowledge and supporting two human–AI interaction paradigms—numerical conversation (NC) and natural-language single-turn (NL-ST)—enabling zero- and few-shot in-context learning. Results: Evaluated on 920 real-world patient cases, XAI augmentation significantly narrows the few-shot diagnostic accuracy gap between LLMs and conventional ML models; NC-based inference asymptotically approaches ML performance as sample size increases; cost-sensitive accuracy is comparable or superior, while gender bias and false-negative rates are markedly reduced. This work establishes the first ML–LLM co-designed XAI paradigm, advancing trustworthy, equitable, and resource-efficient clinical AI.

Technology Category

Application Category

📝 Abstract
The integration of Large Language Models (LLMs) into healthcare diagnostics offers a promising avenue for clinical decision-making. This study outlines the development of a novel method for zero-shot/few-shot in-context learning (ICL) by integrating medical domain knowledge using a multi-layered structured prompt. We also explore the efficacy of two communication styles between the user and LLMs: the Numerical Conversational (NC) style, which processes data incrementally, and the Natural Language Single-Turn (NL-ST) style, which employs long narrative prompts. Our study systematically evaluates the diagnostic accuracy and risk factors, including gender bias and false negative rates, using a dataset of 920 patient records in various few-shot scenarios. Results indicate that traditional clinical machine learning (ML) models generally outperform LLMs in zero-shot and few-shot settings. However, the performance gap narrows significantly when employing few-shot examples alongside effective explainable AI (XAI) methods as sources of domain knowledge. Moreover, with sufficient time and an increased number of examples, the conversational style (NC) nearly matches the performance of ML models. Most notably, LLMs demonstrate comparable or superior cost-sensitive accuracy relative to ML models. This research confirms that, with appropriate domain knowledge and tailored communication strategies, LLMs can significantly enhance diagnostic processes. The findings highlight the importance of optimizing the number of training examples and communication styles to improve accuracy and reduce biases in LLM applications.
Problem

Research questions and friction points this paper is trying to address.

Enhancing clinical decision support with equitable, high-recall LLMs
Reducing gender bias in healthcare predictions using narrative prompts
Integrating domain knowledge for structured clinical data processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Knowledge-guided ICL framework for clinical data
Domain-specific feature groupings and few-shot examples
Narrative prompts reduce bias and enhance recall
🔎 Similar Papers
No similar papers found.