Theorem-of-Thought: A Multi-Agent Framework for Abductive, Deductive, and Inductive Reasoning in Language Models

๐Ÿ“… 2025-06-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large language models (LLMs) exhibit fragile, unstructured, and non-verifiable reasoning in natural language inference (NLI). Method: We propose a multi-agent collaborative reasoning framework that (i) parallelly deploys specialized abductive, deductive, and inductive agents; (ii) explicitly models reasoning chains as structured reasoning graphs; and (iii) introduces an NLI-guided Bayesian belief propagation mechanism to quantify and calibrate logical consistency across reasoning steps. Contribution/Results: By integrating formal reasoning modeling with probabilistic consistency verification, our approach significantly outperforms Chain-of-Thought (CoT), Self-Consistency, and CoT-Decoding on benchmarks including WebOfLies and MultiArith. It not only improves accuracy but also generates clear, traceable, and verifiable reasoning pathsโ€”enhancing both robustness and interpretability of LLM-based NLI.

Technology Category

Application Category

๐Ÿ“ Abstract
Large language models (LLMs) have shown strong performance across natural language reasoning tasks, yet their reasoning processes remain brittle and difficult to interpret. Prompting techniques like Chain-of-Thought (CoT) enhance reliability by eliciting intermediate reasoning steps or aggregating multiple outputs. However, they lack mechanisms for enforcing logical structure and assessing internal coherence. We introduce Theorem-of-Thought (ToTh), a novel framework that models reasoning as collaboration among three parallel agents, each simulating a distinct mode of inference: abductive, deductive, and inductive. Each agent produces a reasoning trace, which is structured into a formal reasoning graph. To evaluate consistency, we apply Bayesian belief propagation guided by natural language inference (NLI), assigning confidence scores to each step. The most coherent graph is selected to derive the final answer. Experiments on symbolic (WebOfLies) and numerical (MultiArith) reasoning benchmarks show that ToTh consistently outperforms CoT, Self-Consistency, and CoT-Decoding across multiple LLMs, while producing interpretable and logically grounded reasoning chains. Our findings suggest a promising direction for building more robust and cognitively inspired LLM reasoning. The implementation is available at https://github.com/KurbanIntelligenceLab/theorem-of-thought.
Problem

Research questions and friction points this paper is trying to address.

Enhancing logical structure in LLM reasoning processes
Improving internal coherence assessment in reasoning steps
Combining abductive, deductive, and inductive reasoning modes
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-agent framework for diverse reasoning modes
Structured reasoning graphs with Bayesian evaluation
Outperforms existing methods on reasoning benchmarks
๐Ÿ”Ž Similar Papers
No similar papers found.