๐ค AI Summary
This work addresses the challenge of selecting and evaluating knowledge graph (KG) construction methodologies for question-answering (QA) over the GDELT global event dataset. We systematically compare ontology-driven and large language model (LLM)-driven KG construction paradigms. Our method introduces the first GDELT-specific ontology-based KG framework, integrating domain-specific schema with LLM-generated structural elements, and proposes a hybrid reasoning mechanism that synergizes graph querying and graph retrieval. It incorporates vector retrieval, ontology modeling, Cypher/SPARQL querying, and graph neural networkโbased retrieval. Experimental results demonstrate that ontology-based KGs significantly improve QA accuracy and interpretability; while LLM-derived KGs excel at summarization, they suffer from low structural consistency. Our ontology-LLM co-generation approach balances formal rigor with semantic flexibility, establishing a novel paradigm for KG construction in retrieval-augmented generation (RAG) systems.
๐ Abstract
In this work we study various Retrieval Augmented Regeneration (RAG) approaches to gain an understanding of the strengths and weaknesses of each approach in a question-answering analysis. To gain this understanding we use a case-study subset of the Global Database of Events, Language, and Tone (GDELT) dataset as well as a corpus of raw text scraped from the online news articles. To retrieve information from the text corpus we implement a traditional vector store RAG as well as state-of-the-art large language model (LLM) based approaches for automatically constructing KGs and retrieving the relevant subgraphs. In addition to these corpus approaches, we develop a novel ontology-based framework for constructing knowledge graphs (KGs) from GDELT directly which leverages the underlying schema of GDELT to create structured representations of global events. For retrieving relevant information from the ontology-based KGs we implement both direct graph queries and state-of-the-art graph retrieval approaches. We compare the performance of each method in a question-answering task. We find that while our ontology-based KGs are valuable for question-answering, automated extraction of the relevant subgraphs is challenging. Conversely, LLM-generated KGs, while capturing event summaries, often lack consistency and interpretability. Our findings suggest benefits of a synergistic approach between ontology and LLM-based KG construction, with proposed avenues toward that end.