Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models

📅 2025-11-29

📈 Citations: 0

✨ Influential: 0

career value

171K/year

🤖 AI Summary

This work addresses the insufficient attention paid by existing large language model (LLM) systems to the intrinsic quality of knowledge graphs (KGs). We propose an end-to-end, LLM-driven pipeline for constructing high-quality, Wikidata-aligned, and ontology-aware KGs from open-domain text. Our method comprises: (1) multi-stage constrained triplet extraction; (2) Wikidata type validation and relation normalization; (3) constraint-based logical reasoning for entity canonicalization; and (4) knowledge fusion. Unlike baselines relying on redundant textual context, our approach achieves superior question-answering performance using only concise, refined triplets: 76.0 F1 on HotpotQA, 59.8 F1 on MuSiQue (with 96% coverage of correct answer entities), and 86% information retention on MINE-1. The generated KG requires fewer than 1,000 output tokens—over three times more efficient than AriGraph. Our key contribution is the first deep integration of LLMs across the entire KG construction pipeline, significantly enhancing compactness, connectivity, and utility while preserving ontological consistency.

Technology Category

Application Category

📝 Abstract

Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$ imes$ fewer than AriGraph and $<$1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.

Problem

Research questions and friction points this paper is trying to address.

Constructs Wikidata-aligned knowledge graphs from text

Enforces ontology constraints to reduce entity duplication

Improves structured knowledge quality for LLMs efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage pipeline extracts triplets with qualifiers

Enforces Wikidata-based type and relation constraints

Normalizes entities to reduce duplication and enhance connectivity

🔎 Similar Papers

No similar papers found.