🤖 AI Summary
This work addresses the limitations of existing approaches in constructing oncology knowledge graphs from unstructured clinical text, which often lack effective fact verification and semantic consistency. The authors propose an end-to-end KG-RAG framework that integrates multi-agent prompt engineering, retrieval-augmented generation, and ontology-aligned RDF/OWL semantic modeling to directly extract entities, attributes, and relations. To mitigate hallucination and enhance semantic fidelity, the method incorporates an entropy-based uncertainty scoring mechanism and a multi-LLM consensus strategy. Notably, it enables gold-standard-free, self-supervised continuous refinement. Evaluated on PDAC and BRCA patient cohorts, the resulting knowledge graphs demonstrate high clinical credibility, SPARQL compatibility, and significant improvements over baseline methods in precision, relevance, and ontological compliance.
📝 Abstract
Large language models (LLMs) offer new opportunities for constructing knowledge graphs (KGs) from unstructured clinical narratives. However, existing approaches often rely on structured inputs and lack robust validation of factual accuracy and semantic consistency, limitations that are especially problematic in oncology. We introduce an end-to-end framework for clinical KG construction and evaluation directly from free text using multi-agent prompting and a schema-constrained Retrieval-Augmented Generation (KG-RAG) strategy. Our pipeline integrates (1) prompt-driven entity, attribute, and relation extraction; (2) entropy-based uncertainty scoring; (3) ontology-aligned RDF/OWL schema generation; and (4) multi-LLM consensus validation for hallucination detection and semantic refinement. Beyond static graph construction, the framework supports continuous refinement and self-supervised evaluation, enabling iterative improvement of graph quality. Applied to two oncology cohorts (PDAC and BRCA), our method produces interpretable, SPARQL-compatible, and clinically grounded knowledge graphs without relying on gold-standard annotations. Experimental results demonstrate consistent gains in precision, relevance, and ontology compliance over baseline methods.