Augmenting Black-box LLMs with Medical Textbooks for Biomedical Question Answering (Published in Findings of EMNLP 2024)

📅 2023-09-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

173K/year
🤖 AI Summary
Large language models (LLMs) exhibit insufficient domain expertise, heavy reliance on fine-tuning, and suboptimal generalization across heterogeneous knowledge sources in biomedical question answering. Method: We propose a plug-and-play textbook-enhanced retrieval-augmented generation (RAG) framework that leverages structured medical textbooks as a lightweight, high-fidelity knowledge source. It integrates three core components—query enhancement, hybrid retrieval, and knowledge self-refinement—enabling dynamic injection of authoritative domain knowledge without model fine-tuning. Contribution/Results: The framework achieves zero-shot adaptation to black-box LLMs (e.g., GPT-4-Turbo), improving accuracy by 11.6–16.6% across three biomedical QA benchmarks—outperforming Med-PaLM 2 by 2–3%. Textbook-based retrieval substantially surpasses Wikipedia-based retrieval (+7.8–13.7%), providing the first empirical validation of structured medical textbooks as a superior and viable RAG knowledge source.
📝 Abstract
Large-scale language models (LLMs) like ChatGPT have demonstrated impressive abilities in generating responses based on human instructions. However, their use in the medical field can be challenging due to their lack of specific, in-depth knowledge. In this study, we present a system called LLMs Augmented with Medical Textbooks (LLM-AMT) designed to enhance the proficiency of LLMs in specialized domains. LLM-AMT integrates authoritative medical textbooks into the LLMs' framework using plug-and-play modules. These modules include a Query Augmenter, a Hybrid Textbook Retriever, and a Knowledge Self-Refiner. Together, they incorporate authoritative medical knowledge. Additionally, an LLM Reader aids in contextual understanding. Our experimental results on three medical QA tasks demonstrate that LLMAMT significantly improves response quality, with accuracy gains ranging from 11.6% to 16.6%. Notably, with GPT-4-Turbo as the base model, LLM-AMT outperforms the specialized Med-PaLM 2 model pre-trained on a massive amount of medical corpus by 2-3%. We found that despite being 100x smaller in size, medical textbooks as a retrieval corpus is proven to be a more effective knowledge database than Wikipedia in the medical domain, boosting performance by 7.8%-13.7%.
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs for medical QA
Integrating medical textbooks into LLMs
Improving accuracy in specialized domains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Integrates medical textbooks into LLMs
Uses plug-and-play modules for enhancement
Boosts medical QA accuracy significantly
🔎 Similar Papers
No similar papers found.