Combining LLMs and Knowledge Graphs to Reduce Hallucinations in Question Answering

📅 2024-09-06
🏛️ arXiv.org
📈 Citations: 4
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
To address the high-risk hallucinations frequently generated by large language models (LLMs) in biomedical question answering, this paper proposes a dual-driven, closed-loop verification framework integrating LLMs with knowledge graphs (KGs). Methodologically, we integrate a Cypher query validator into LangChain to enforce syntactic and semantic constraints; additionally, we design an interpretable query refinement module supporting natural-language input, visualized query generation, and traceable reasoning paths. Our key contributions include: (i) the first LLM-KG collaborative verification mechanism that jointly ensures factual accuracy and model interpretability; and (ii) an open-source, reproducible web system. Evaluated on a 50-question biomedical benchmark, our framework significantly reduces hallucination rates: GPT-4 Turbo achieves state-of-the-art query accuracy, while llama3:70b—after prompt optimization—attains comparable performance.

Technology Category

Application Category

📝 Abstract
Advancements in natural language processing have revolutionized the way we can interact with digital information systems, such as databases, making them more accessible. However, challenges persist, especially when accuracy is critical, as in the biomedical domain. A key issue is the hallucination problem, where models generate information unsupported by the underlying data, potentially leading to dangerous misinformation. This paper presents a novel approach designed to bridge this gap by combining Large Language Models (LLM) and Knowledge Graphs (KG) to improve the accuracy and reliability of question-answering systems, on the example of a biomedical KG. Built on the LangChain framework, our method incorporates a query checker that ensures the syntactical and semantic validity of LLM-generated queries, which are then used to extract information from a Knowledge Graph, substantially reducing errors like hallucinations. We evaluated the overall performance using a new benchmark dataset of 50 biomedical questions, testing several LLMs, including GPT-4 Turbo and llama3:70b. Our results indicate that while GPT-4 Turbo outperforms other models in generating accurate queries, open-source models like llama3:70b show promise with appropriate prompt engineering. To make this approach accessible, a user-friendly web-based interface has been developed, allowing users to input natural language queries, view generated and corrected Cypher queries, and verify the resulting paths for accuracy. Overall, this hybrid approach effectively addresses common issues such as data gaps and hallucinations, offering a reliable and intuitive solution for question answering systems. The source code for generating the results of this paper and for the user-interface can be found in our Git repository: https://git.zib.de/lpusch/cyphergenkg-gui
Problem

Research questions and friction points this paper is trying to address.

Reducing hallucinations in question answering systems using LLMs and Knowledge Graphs
Improving accuracy of biomedical question answering through hybrid LLM-KG approach
Ensuring syntactical and semantic validity of LLM-generated queries for Knowledge Graphs
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining LLMs with Knowledge Graphs for accuracy
Using query checker for valid Cypher query generation
Developing web interface for natural language queries