Self-MedRAG: a Self-Reflective Hybrid Retrieval-Augmented Generation Framework for Reliable Medical Question Answering

📅 2026-01-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of hallucination in large language models when tackling complex medical question answering, where conventional single-retrieval retrieval-augmented generation (RAG) struggles to support multi-step reasoning. To this end, the authors propose a self-reflective hybrid RAG framework that emulates the clinical “hypothesis–verification” workflow. The approach integrates BM25 and Contriever retrievers via reciprocal rank fusion (RRF), generates answers grounded in explicit reasoning chains, and incorporates a lightweight self-reflection module—based on either natural language inference (NLI) or a large language model—to iteratively verify and refine responses. Query reformulation is further employed to enhance retrieval quality. Evaluated on MedQA and PubMedQA, the method achieves accuracy rates of 83.33% and 79.82%, respectively, significantly outperforming single-retriever baselines and effectively reducing unsupported answers.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated significant potential in medical Question Answering (QA), yet they remain prone to hallucinations and ungrounded reasoning, limiting their reliability in high-stakes clinical scenarios. While Retrieval-Augmented Generation (RAG) mitigates these issues by incorporating external knowledge, conventional single-shot retrieval often fails to resolve complex biomedical queries requiring multi-step inference. To address this, we propose Self-MedRAG, a self-reflective hybrid framework designed to mimic the iterative hypothesis-verification process of clinical reasoning. Self-MedRAG integrates a hybrid retrieval strategy, combining sparse (BM25) and dense (Contriever) retrievers via Reciprocal Rank Fusion (RRF) to maximize evidence coverage. It employs a generator to produce answers with supporting rationales, which are then assessed by a lightweight self-reflection module using Natural Language Inference (NLI) or LLM-based verification. If the rationale lacks sufficient evidentiary support, the system autonomously reformulates the query and iterates to refine the context. We evaluated Self-MedRAG on the MedQA and PubMedQA benchmarks. The results demonstrate that our hybrid retrieval approach significantly outperforms single-retriever baselines. Furthermore, the inclusion of the self-reflective loop yielded substantial gains, increasing accuracy on MedQA from 80.00% to 83.33% and on PubMedQA from 69.10% to 79.82%. These findings confirm that integrating hybrid retrieval with iterative, evidence-based self-reflection effectively reduces unsupported claims and enhances the clinical reliability of LLM-based systems.
Problem

Research questions and friction points this paper is trying to address.

medical question answering
hallucination
retrieval-augmented generation
clinical reasoning
evidence-based reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Retrieval
Self-Reflection
Retrieval-Augmented Generation
Clinical Reasoning
Evidence-Based Verification
🔎 Similar Papers
No similar papers found.
J
Jessica Ryan
School of Computer Science, Bina Nusantara University, Anggrek Campus, Jl. Raya Kb. Jeruk No. 27, 11530, Jakarta, Indonesia
A
A. I. Gumilang
School of Computer Science, Bina Nusantara University, Anggrek Campus, Jl. Raya Kb. Jeruk No. 27, 11530, Jakarta, Indonesia
R
Robert Wiliam
School of Computer Science, Bina Nusantara University, Anggrek Campus, Jl. Raya Kb. Jeruk No. 27, 11530, Jakarta, Indonesia
Derwin Suhartono
Derwin Suhartono
Computer Science Department, Bina Nusantara University
Artificial IntelligenceComputational LinguisticsPersonality Recognition