LLM-based SPARQL Query Generation from Natural Language over Federated Knowledge Graphs

📅 2024-10-08
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Natural language-to-SPARQL translation in bioinformatics federated knowledge graphs suffers from high error rates and severe large language model (LLM) hallucination. Method: This paper proposes a retrieval-augmented generation (RAG) framework that integrates KG metadata—including schema structure and high-quality SPARQL query examples—via three tightly coupled components: schema-aware retrieval, example-driven generation, and executability verification (encompassing syntactic validation, semantic consistency checking, and automated correction). Contribution/Results: Experiments demonstrate substantial improvements in query accuracy and robustness, achieving state-of-the-art (SOTA) performance on real-world federated KG benchmarks. The framework has been deployed on the Expasy platform (chat.expasy.org), enabling reliable biomedical knowledge question answering and exploratory querying in production settings.

Technology Category

Application Category

📝 Abstract
We introduce a Retrieval-Augmented Generation (RAG) system for translating user questions into accurate federated SPARQL queries over bioinformatics knowledge graphs (KGs) leveraging Large Language Models (LLMs). To enhance accuracy and reduce hallucinations in query generation, our system utilises metadata from the KGs, including query examples and schema information, and incorporates a validation step to correct generated queries. The system is available online at chat.expasy.org.
Problem

Research questions and friction points this paper is trying to address.

Generate SPARQL queries from natural language
Improve accuracy in federated knowledge graphs
Reduce hallucinations using metadata validation
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-based SPARQL query generation
Retrieval-Augmented Generation system
Metadata-enhanced query validation
🔎 Similar Papers
No similar papers found.
Vincent Emonet
Vincent Emonet
Data science developer, Swiss Institute of Bioinformatics
Web sémantiqueontologieslinked open data
J
Jerven T. Bolleman
SIB Swiss Institute of Bioinformatics, Switzerland
S
Severine Duvaud
SIB Swiss Institute of Bioinformatics, Switzerland
T
Tarcisio Mendes de Farias
SIB Swiss Institute of Bioinformatics, Switzerland
Ana Claudia Sima
Ana Claudia Sima
Co-Team Lead, Knowledge Representation Unit at SIB Swiss Institute of Bioinformatics
Question AnsweringSemantic IntegrationKnowledge GraphsLarge Language Models