Cost-Effective, High-Performance Open-Source LLMs via Optimized Context Retrieval

📅 2024-09-23
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Open-source large language models (LLMs) for healthcare suffer from high inference costs, low factual accuracy, and poor clinical deployability. Method: We propose an efficient optimization framework for open-domain medical question answering. It introduces OpenMedQA—the first clinically realistic, multi-source-validated open benchmark; context-aware retrieval-augmented prompting; a systematic prompt engineering guideline; a cost–accuracy Pareto frontier evaluation framework; and the open-source prompt_engine toolkit alongside a structured reasoning database supporting chain-of-thought (CoT) and tree-of-thought (ToT) paradigms. Contribution/Results: Experiments demonstrate that optimized open-source LLMs match or approach the performance of proprietary models (e.g., GPT-4) across multiple medical QA tasks while reducing inference cost by 3–5×. All resources—including benchmarks, tools, prompts, and reasoning templates—are publicly released, substantially enhancing reproducibility and clinical applicability of medical AI research.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) in healthcare promise transformation, yet adoption is limited by concerns over factual accuracy and the high cost of proprietary models. This study demonstrates that optimized context retrieval unlocks cost-effective, high-performance healthcare AI using open-source LLMs, achieving a significantly improved cost-accuracy Pareto frontier for medical question answering and showcasing that open models can rival proprietary systems at a fraction of the cost. A key contribution is OpenMedQA, a novel benchmark for open-ended medical question answering that overcomes the limitations of multiple-choice formats - formats that we show lead to performance degradation in open-ended settings and often lack clinical realism. Further contributions include: (1) practical guidelines for implementing optimized context retrieval; (2) empirical validation of enhanced cost-effectiveness via the improved Pareto frontier; (3) the introduction of OpenMedQA for rigorous evaluation of open-ended medical QA; and (4) the release of prompt_engine alongside CoT/ToT/Thinking databases as community resources for cost-effective healthcare AI. Advancing optimized retrieval and open-ended QA benchmarking, we pave the way for more accessible and impactful LLM-powered healthcare solutions. All the materials have been made public.
Problem

Research questions and friction points this paper is trying to address.

Optimized context retrieval enhances cost-effective healthcare AI.
OpenMedQA benchmark improves open-ended medical question answering.
Open-source LLMs rival proprietary systems at lower costs.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimized context retrieval enhances cost-effective healthcare AI.
OpenMedQA benchmark improves open-ended medical question answering.
Release of prompt_engine and databases supports community healthcare AI.
🔎 Similar Papers
No similar papers found.
J
Jordi Bayarri-Planas
Barcelona Supercomputing Center (BSC), Barcelona, Spain
A
Ashwin Kumar Gururajan
Barcelona Supercomputing Center (BSC), Barcelona, Spain
D
Dario Garcia-Gasulla
Barcelona Supercomputing Center (BSC), Barcelona, Spain