🤖 AI Summary
Current large language models (LLMs) are largely confined to sentence-level entity and relation extraction for knowledge graph (KG) construction, failing to meet the semantic richness and structural integrity requirements of high-stakes domains. To address this, we propose a hierarchical knowledge extraction framework that transcends intra-sentential context, enabling multi-granular information integration—from cross-sentence to discourse-level—and hierarchical structural modeling. Our method introduces a prompt-driven, end-to-end extraction pipeline integrating structured constraints with semantic post-processing. Furthermore, we release the first LLM-generated KG dataset specifically curated for child mental health—a novel, domain-specific resource. Experimental results demonstrate that the resulting KG significantly outperforms baselines in both structural coherence and semantic accuracy. This work establishes a new paradigm for interpretable and reliable AI, supported by open-source tools and data.
📝 Abstract
Knowledge graphs (KGs) are vital for knowledge-intensive tasks and have shown promise in reducing hallucinations in large language models (LLMs). However, constructing high-quality KGs remains difficult, requiring accurate information extraction and structured representations that support interpretability and downstream utility. Existing LLM-based approaches often focus narrowly on entity and relation extraction, limiting coverage to sentence-level contexts or relying on predefined schemas. We propose a hierarchical extraction framework that organizes information at multiple levels, enabling the creation of semantically rich and well-structured KGs. Using state-of-the-art LLMs, we extract and construct knowledge graphs and evaluate them comprehensively from both structural and semantic perspectives. Our results highlight the strengths and shortcomings of current LLMs in KG construction and identify key challenges for future work. To advance research in this area, we also release a curated dataset of LLM-generated KGs derived from research papers on children's mental well-being. This resource aims to foster more transparent, reliable, and impactful applications in high-stakes domains such as healthcare.