UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

📅 2026-05-31

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This study addresses the challenge of systematically characterizing drug–disease relationships amidst the rapid growth and high heterogeneity of biomedical literature. To this end, the authors propose UniD³, a unified knowledge graph–enhanced retrieval-augmented generation (KG-RAG) framework that integrates large language models with structured biomedical knowledge. Leveraging a two-stage entity extraction and graph fusion strategy, the method automatically constructs a high-quality knowledge graph and dataset from PubMed, encompassing 28,915 drug–disease associations, 15,042 efficacy assessments, and over 4,000 target-related question-answer pairs. The approach substantially improves the interpretability and reliability of generated outputs, achieving F1 scores of 0.85–0.87 on drug–disease matching and drug efficacy assessment tasks, 0.82 on drug–target association, and an AUROC of 0.90 in clinical review evaluation—consistently outperforming pure large language model baselines.

📝 Abstract

Systematic characterization of drug-disease relationships is essential for drug discovery and repurposing, yet is hindered by the heterogeneity and rapid growth of biomedical literature. Existing datasets rely on labor-intensive curation and are often incomplete, while LLM-only approaches suffer from hallucination and weak evidence grounding. We introduce UniD$^3$, a unified framework that integrates Large Language Models with Knowledge Graph-enhanced Retrieval-Augmented Generation (KG-RAG) to extract, organize, and validate drug-disease knowledge across Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and Drug-Target Analysis (DTA). UniD$^3$ processes 157,849 PubMed articles with Llama 3.3-70B and constructs knowledge graphs via a dual-stage strategy combining paper-level extraction with KG-level consolidation centered on drug and disease entities. These graphs support KG-RAG-based generation of structured datasets, evaluated through external benchmarks, fuzzy matching with curated resources, and clinician review. UniD$^3$ produces six knowledge graphs and large-scale datasets, including 28,915 DDM, 15,042 DEA, and over 4,000 DTA QA pairs. External validation shows strong performance (F1: 0.85-0.87 for DDM/DEA; 0.82 for DTA), with clinician review confirming high reliability (AUROC = 0.90). KG-RAG-augmented models outperform standalone LLMs, and the UniD$^3$ chatbot enables interpretable, citation-supported exploration of drug-disease relationships. UniD$^3$ provides a scalable, extensible framework for transforming unstructured biomedical literature into high-quality, structured drug-disease knowledge, supporting AI-driven discovery, repurposing, and precision medicine.

Problem

Research questions and friction points this paper is trying to address.

drug-disease relationship

biomedical literature heterogeneity

knowledge curation

LLM hallucination

evidence grounding

Innovation

Methods, ideas, or system contributions that make the work stand out.

KG-RAG

Knowledge Graph

Drug-Disease Discovery