🤖 AI Summary
Large language models (LLMs) exhibit loose logical reasoning and contextual inconsistency when performing knowledge base (KB) question answering. Method: We propose a human-inspired two-stage reasoning framework: (1) breadth decomposition—structurally decomposing complex questions into interdependent sub-questions; and (2) depth solving—layered, stepwise reasoning over these sub-questions. A dynamic knowledge boundary model filters relevant KB sources, while a logic-form interface explicitly encodes sub-question dependencies. The framework integrates logic-guided retrieval and reasoning, multi-turn dialogue supervised fine-tuning, reflexive reasoning with confidence calibration, and an iterative corpus synthesis and evaluation pipeline. Contribution/Results: Experiments demonstrate substantial improvements in reasoning trajectory stability, interpretability, and knowledge coverage, alongside reduced redundant reflection. Our approach achieves more accurate and coherent reasoning on complex KB QA tasks.
📝 Abstract
In this paper, we introduce KAG-Thinker, a novel human-like reasoning framework built upon a parameter-light large language model (LLM). Our approach enhances the logical coherence and contextual consistency of the thinking process in question-answering (Q&A) tasks on domain-specific knowledge bases (KBs) within LLMs. This framework simulates human cognitive mechanisms for handling complex problems by establishing a structured thinking process. Continuing the extbf{Logical Form} guided retrieval and reasoning technology route of KAG v0.7, firstly, it decomposes complex questions into independently solvable sub-problems(also referred to as logical forms) through extbf{breadth decomposition}, each represented in two equivalent forms-natural language and logical function-and further classified as either Knowledge Retrieval or Reasoning Analysis tasks, with dependencies and variables passing explicitly modeled via logical function interfaces. In the solving process, the Retrieval function is used to perform knowledge retrieval tasks, while the Math and Deduce functions are used to perform reasoning analysis tasks. Secondly, it is worth noting that, in the Knowledge Retrieval sub-problem tasks, LLMs and external knowledge sources are regarded as equivalent KBs. We use the extbf{knowledge boundary} model to determine the optimal source using self-regulatory mechanisms such as confidence calibration and reflective reasoning, and use the extbf{depth solving} model to enhance the comprehensiveness of knowledge acquisition. Finally, instead of utilizing reinforcement learning, we employ supervised fine-tuning with multi-turn dialogues to align the model with our structured inference paradigm, thereby avoiding excessive reflection. This is supported by a data evaluation framework and iterative corpus synthesis, which facilitate the generation of detailed reasoning trajectories...