🤖 AI Summary
This study addresses key bottlenecks—low path retrieval efficiency, structural incompatibility, and weak interpretability—in large-scale biomedical knowledge graphs (KGs) for drug repurposing and drug–drug interaction (DDI) severity prediction. We propose a diversity-aware variant of Yen’s algorithm to efficiently extract K acyclic, biologically meaningful entity paths. Furthermore, we introduce the first end-to-end structured encoding method that transforms KG paths into LLM-parsable formats, enabling interpretable, synergistic reasoning between large language models (LLMs) and graph neural networks (GNNs). Evaluated on Llama-3-8B, our approach achieves zero-shot F1 improvements of +12.45 for drug repurposing and +13.42 for DDI severity prediction. Our EmerGNN model maintains high performance even with 90% graph size compression, demonstrating robustness and scalability. The framework bridges symbolic KG reasoning with neural LLM/GNN inference while preserving biological interpretability and computational efficiency.
📝 Abstract
Drug discovery is a complex and time-intensive process that requires identifying and validating new therapeutic candidates. Computational approaches using large-scale biomedical knowledge graphs (KGs) offer a promising solution to accelerate this process. However, extracting meaningful insights from large-scale KGs remains challenging due to the complexity of graph traversal. Existing subgraph-based methods are tailored to graph neural networks (GNNs), making them incompatible with other models, such as large language models (LLMs). We introduce K-Paths, a retrieval framework that extracts structured, diverse, and biologically meaningful paths from KGs. Integrating these paths enables LLMs and GNNs to effectively predict unobserved drug-drug and drug-disease interactions. Unlike traditional path-ranking approaches, K-Paths retrieves and transforms paths into a structured format that LLMs can directly process, facilitating explainable reasoning. K-Paths employs a diversity-aware adaptation of Yen's algorithm to retrieve the K shortest loopless paths between entities in an interaction query, prioritizing biologically relevant and diverse relationships. Our experiments on benchmark datasets show that K-Paths improves the zero-shot performance of Llama 8.1B's F1-score by 12.45 points on drug repurposing and 13.42 points on interaction severity prediction. We also show that Llama 70B achieves F1-score gains of 6.18 and 8.46 points, respectively. K-Paths also improves the supervised training efficiency of EmerGNN, a state-of-the-art GNN, by reducing KG size by 90% while maintaining strong predictive performance. Beyond its scalability and efficiency, K-Paths uniquely bridges the gap between KGs and LLMs, providing explainable rationales for predicted interactions. These capabilities show that K-Paths is a valuable tool for efficient data-driven drug discovery.