🤖 AI Summary
This study investigates the internal mechanisms by which large language models (LLMs) perform propositional logic reasoning—specifically, whether they rely on dedicated neural modules and whether they implement modular, sequential planning.
Method: We design synthetic logic tasks requiring nontrivial planning, train small three-layer Transformers from scratch (achieving 100% test accuracy with interpretable reasoning traces), and conduct causal interventions and module-level circuit analysis on Mistral-7B and Gemma-2-9B.
Contribution/Results: We systematically identify, for the first time, a cross-layer attentional neural circuit that jointly implements logical planning. Results demonstrate that LLMs implicitly adopt human-like reasoning strategies, supported by both causal necessity and sufficiency evidence. Crucially, the “planning–reasoning” division of labor dynamically couples attention and MLP submodules, revealing a verifiable, interpretable, and structured reasoning mechanism within the Transformer architecture.
📝 Abstract
Large language models (LLMs) have shown amazing performance on tasks that require planning and reasoning. Motivated by this, we investigate the internal mechanisms that underpin a network's ability to perform complex logical reasoning. We first construct a synthetic propositional logic problem that serves as a concrete test-bed for network training and evaluation. Crucially, this problem demands nontrivial planning to solve. We perform our study on two fronts. First, we pursue an understanding of precisely how a three-layer transformer, trained from scratch and attains perfect test accuracy, solves this problem. We are able to identify certain"planning"and"reasoning"mechanisms in the network that necessitate cooperation between the attention blocks to implement the desired logic. Second, we study how pretrained LLMs, namely Mistral-7B and Gemma-2-9B, solve this problem. We characterize their reasoning circuits through causal intervention experiments, providing necessity and sufficiency evidence for the circuits. We find evidence suggesting that the two models' latent reasoning strategies are surprisingly similar, and human-like. Overall, our work systemically uncovers novel aspects of small and large transformers, and continues the study of how they plan and reason.