KAG-Thinker: Teaching Large Language Models to Think with Human-like Reasoning Process

📅 2025-06-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) exhibit loose logical reasoning and contextual inconsistency when performing knowledge base (KB) question answering. Method: We propose a human-inspired two-stage reasoning framework: (1) breadth decomposition—structurally decomposing complex questions into interdependent sub-questions; and (2) depth solving—layered, stepwise reasoning over these sub-questions. A dynamic knowledge boundary model filters relevant KB sources, while a logic-form interface explicitly encodes sub-question dependencies. The framework integrates logic-guided retrieval and reasoning, multi-turn dialogue supervised fine-tuning, reflexive reasoning with confidence calibration, and an iterative corpus synthesis and evaluation pipeline. Contribution/Results: Experiments demonstrate substantial improvements in reasoning trajectory stability, interpretability, and knowledge coverage, alongside reduced redundant reflection. Our approach achieves more accurate and coherent reasoning on complex KB QA tasks.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce KAG-Thinker, a novel human-like reasoning framework built upon a parameter-light large language model (LLM). Our approach enhances the logical coherence and contextual consistency of the thinking process in question-answering (Q&A) tasks on domain-specific knowledge bases (KBs) within LLMs. This framework simulates human cognitive mechanisms for handling complex problems by establishing a structured thinking process. Continuing the extbf{Logical Form} guided retrieval and reasoning technology route of KAG v0.7, firstly, it decomposes complex questions into independently solvable sub-problems(also referred to as logical forms) through extbf{breadth decomposition}, each represented in two equivalent forms-natural language and logical function-and further classified as either Knowledge Retrieval or Reasoning Analysis tasks, with dependencies and variables passing explicitly modeled via logical function interfaces. In the solving process, the Retrieval function is used to perform knowledge retrieval tasks, while the Math and Deduce functions are used to perform reasoning analysis tasks. Secondly, it is worth noting that, in the Knowledge Retrieval sub-problem tasks, LLMs and external knowledge sources are regarded as equivalent KBs. We use the extbf{knowledge boundary} model to determine the optimal source using self-regulatory mechanisms such as confidence calibration and reflective reasoning, and use the extbf{depth solving} model to enhance the comprehensiveness of knowledge acquisition. Finally, instead of utilizing reinforcement learning, we employ supervised fine-tuning with multi-turn dialogues to align the model with our structured inference paradigm, thereby avoiding excessive reflection. This is supported by a data evaluation framework and iterative corpus synthesis, which facilitate the generation of detailed reasoning trajectories...
Problem

Research questions and friction points this paper is trying to address.

Enhancing logical coherence in LLM question-answering tasks
Simulating human cognitive mechanisms for complex problems
Optimizing knowledge retrieval via self-regulatory boundary models
Innovation

Methods, ideas, or system contributions that make the work stand out.

Breadth decomposition for logical sub-problems
Knowledge boundary model for optimal retrieval
Supervised fine-tuning with multi-turn dialogues
D
Dalong Zhang
Knowledge Engine Team @Inclusion AI, Ant Group
J
Jun Xu
Knowledge Engine Team @Inclusion AI, Ant Group
J
Jun Zhou
Knowledge Engine Team @Inclusion AI, Ant Group
Lei Liang
Lei Liang
Ant Group
Knowledge GraphAI
L
Lin Yuan
Knowledge Engine Team @Inclusion AI, Ant Group
L
Ling Zhong
Knowledge Engine Team @Inclusion AI, Ant Group
Mengshu Sun
Mengshu Sun
Beijing University of Technology
Deep LearningModel Compression and Acceleration
P
Peilong Zhao
Knowledge Engine Team @Inclusion AI, Ant Group
Q
QiWei Wang
Knowledge Engine Team @Inclusion AI, Ant Group
Xiaorui Wang
Xiaorui Wang
Professor of Computer Engineering, The Ohio State University
Power ManagementData CentersReal-Time Embedded SystemsComputer ArchitectureComputer Systems
X
Xinkai Du
Knowledge Engine Team @Inclusion AI, Ant Group
Y
YangYang Hou
Knowledge Engine Team @Inclusion AI, Ant Group
Y
Yu Ao
Knowledge Engine Team @Inclusion AI, Ant Group
Z
ZhaoYang Wang
Knowledge Engine Team @Inclusion AI, Ant Group
Z
Zhengke Gui
Knowledge Engine Team @Inclusion AI, Ant Group
Z
ZhiYing Yi
Knowledge Engine Team @Inclusion AI, Ant Group
Z
Zhongpu Bo
Knowledge Engine Team @Inclusion AI, Ant Group