Wikontic: Constructing Wikidata-Aligned, Ontology-Aware Knowledge Graphs with Large Language Models

πŸ“… 2025-11-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the insufficient attention paid by existing large language model (LLM) systems to the intrinsic quality of knowledge graphs (KGs). We propose an end-to-end, LLM-driven pipeline for constructing high-quality, Wikidata-aligned, and ontology-aware KGs from open-domain text. Our method comprises: (1) multi-stage constrained triplet extraction; (2) Wikidata type validation and relation normalization; (3) constraint-based logical reasoning for entity canonicalization; and (4) knowledge fusion. Unlike baselines relying on redundant textual context, our approach achieves superior question-answering performance using only concise, refined triplets: 76.0 F1 on HotpotQA, 59.8 F1 on MuSiQue (with 96% coverage of correct answer entities), and 86% information retention on MINE-1. The generated KG requires fewer than 1,000 output tokensβ€”over three times more efficient than AriGraph. Our key contribution is the first deep integration of LLMs across the entire KG construction pipeline, significantly enhancing compactness, connectivity, and utility while preserving ontological consistency.

Technology Category

Application Category

πŸ“ Abstract
Knowledge graphs (KGs) provide structured, verifiable grounding for large language models (LLMs), but current LLM-based systems commonly use KGs as auxiliary structures for text retrieval, leaving their intrinsic quality underexplored. In this work, we propose Wikontic, a multi-stage pipeline that constructs KGs from open-domain text by extracting candidate triplets with qualifiers, enforcing Wikidata-based type and relation constraints, and normalizing entities to reduce duplication. The resulting KGs are compact, ontology-consistent, and well-connected; on MuSiQue, the correct answer entity appears in 96% of generated triplets. On HotpotQA, our triplets-only setup achieves 76.0 F1, and on MuSiQue 59.8 F1, matching or surpassing several retrieval-augmented generation baselines that still require textual context. In addition, Wikontic attains state-of-the-art information-retention performance on the MINE-1 benchmark (86%), outperforming prior KG construction methods. Wikontic is also efficient at build time: KG construction uses less than 1,000 output tokens, about 3$ imes$ fewer than AriGraph and $<$1/20 of GraphRAG. The proposed pipeline enhances the quality of the generated KG and offers a scalable solution for leveraging structured knowledge in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Constructs Wikidata-aligned knowledge graphs from text
Enforces ontology constraints to reduce entity duplication
Improves structured knowledge quality for LLMs efficiently
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-stage pipeline extracts triplets with qualifiers
Enforces Wikidata-based type and relation constraints
Normalizes entities to reduce duplication and enhance connectivity
πŸ”Ž Similar Papers
No similar papers found.