🤖 AI Summary
Large language models (LLMs) frequently generate implicit factual errors in long-text generation—especially critical in high-stakes domains like medicine—while existing document-retrieval-based fact-checking methods struggle with multi-hop logical reasoning and incur prohibitive computational overhead.
Method: We propose the first single-pass fact-checking framework that injects an extractive knowledge graph (KG) as soft prompts into LLMs. Our approach integrates automated KG construction, graph neural network (GNN)-based representation learning, and structured reasoning-enhanced soft prompt tuning to enable end-to-end multi-hop relational modeling.
Contribution/Results: Evaluated on seven general-domain and medical benchmarks, our method achieves an average 6.1% improvement over strong baselines. It matches the performance of DeepSeek-V3 and OpenAI-o1 while using significantly fewer parameters, thereby transcending conventional pairwise discriminative paradigms.
📝 Abstract
Large language models (LLMs) are widely used, but they often generate subtle factual errors, especially in long-form text. These errors are fatal in some specialized domains such as medicine. Existing fact-checking with grounding documents methods face two main challenges: (1) they struggle to understand complex multihop relations in long documents, often overlooking subtle factual errors; (2) most specialized methods rely on pairwise comparisons, requiring multiple model calls, leading to high resource and computational costs. To address these challenges, we propose extbf{ extit{GraphCheck}}, a fact-checking framework that uses extracted knowledge graphs to enhance text representation. Graph Neural Networks further process these graphs as a soft prompt, enabling LLMs to incorporate structured knowledge more effectively. Enhanced with graph-based reasoning, GraphCheck captures multihop reasoning chains which are often overlooked by existing methods, enabling precise and efficient fact-checking in a single inference call. Experimental results on seven benchmarks spanning both general and medical domains demonstrate a 6.1% overall improvement over baseline models. Notably, GraphCheck outperforms existing specialized fact-checkers and achieves comparable performance with state-of-the-art LLMs, such as DeepSeek-V3 and OpenAI-o1, with significantly fewer parameters.