🤖 AI Summary
To address the low reliability of large language models (LLMs) in chain-of-thought (CoT) reasoning and the interference of natural language expressions with logical inference, this paper proposes KG-CoT—a knowledge graph (KG)-enhanced CoT framework. Methodologically, it introduces (i) a novel KG-modulated CoT generation mechanism that injects structured knowledge into reasoning chains; (ii) a learnable knowledge sub-case retrieval module that synergistically integrates retrieval-augmented generation (RAG) with KGs for precise evidence retrieval; and (iii) a pseudo-programmatic prompting execution paradigm that enforces code-like structural constraints on reasoning steps to enhance logical rigor. The framework unifies KG modeling, RAG, and multi-stage joint fine-tuning. Evaluated on nine public reasoning benchmarks, KG-CoT achieves absolute accuracy improvements of 4.0–23.0%; further validation on four domain-specific datasets confirms its high accuracy, efficiency, and strong generalization capability.
📝 Abstract
While chain-of-thought (CoT) reasoning improves the performance of large language models (LLMs) in complex tasks, it still has two main challenges: the low reliability of relying solely on LLMs to generate reasoning chains and the interference of natural language reasoning chains on the inference logic of LLMs. To address these issues, we propose CoT-RAG, a novel reasoning framework with three key designs: (i) Knowledge Graph-driven CoT Generation, featuring knowledge graphs to modulate reasoning chain generation of LLMs, thereby enhancing reasoning credibility; (ii) Learnable Knowledge Case-aware RAG, which incorporates retrieval-augmented generation (RAG) into knowledge graphs to retrieve relevant sub-cases and sub-descriptions, providing LLMs with learnable information; (iii) Pseudo-Program Prompting Execution, which encourages LLMs to execute reasoning tasks in pseudo-programs with greater logical rigor. We conduct a comprehensive evaluation on nine public datasets, covering three reasoning problems. Compared with the-state-of-the-art methods, CoT-RAG exhibits a significant accuracy improvement, ranging from 4.0% to 23.0%. Furthermore, testing on four domain-specific datasets, CoT-RAG shows remarkable accuracy and efficient execution, highlighting its strong practical applicability and scalability.