A Mechanistic Interpretation of Syllogistic Reasoning in Auto-Regressive Language Models

📅 2024-08-16

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This work investigates whether autoregressive language models possess systematic syllogistic reasoning capabilities—beyond superficial statistical heuristics. Using a causal-intervention-driven circuit discovery method, we combine targeted attention head ablation with cross-model generalization tests across architectures, scales, and inference paradigms. We identify, for the first time, a content-agnostic, transferable “middle-term suppression” reasoning circuit—empirically shown to be both necessary and sufficient for syllogistic reasoning at >60% accuracy. The circuit implements premise-to-conclusion information flow and exhibits coupling with world-knowledge modules, challenging the assumption of abstract logical primitives as modular and independent. Our core contributions are: (i) establishing the first interpretable and transferable syllogistic reasoning mechanism in LMs; (ii) demonstrating its robustness across models and its inherent limitations due to entanglement with non-logical knowledge; and (iii) providing causal evidence that systematic deductive reasoning emerges from identifiable, localized computational subcircuits.

Technology Category

Application Category

📝 Abstract

Recent studies on logical reasoning in Language Models (LMs) have sparked a debate on whether they can learn systematic reasoning principles during pre-training or merely exploit superficial patterns in the training data. This paper presents a mechanistic interpretation of syllogistic reasoning in LMs to advance the understanding of internal dynamics. Specifically, we present a methodology for circuit discovery aimed at interpreting content-independent reasoning mechanisms. Through two distinct intervention methods, we uncover a sufficient and necessary circuit involving middle-term suppression that elucidates how LMs transfer information to derive valid conclusions from premises. Furthermore, we investigate how belief biases manifest in syllogistic reasoning, finding evidence of partial contamination from additional attention heads responsible for encoding commonsense and contextualized knowledge. Finally, we explore the generalization of the discovered mechanisms across various syllogistic schemes, model sizes and architectures, finding that the identified circuit is sufficient and necessary for the schemes on which the models achieve high downstream accuracy (>60%), and that the activation patterns apply to models of different families. Overall, our findings suggest that LMs indeed learn transferable content-independent reasoning mechanisms, but that, at the same time, such mechanisms do not involve generalizable and abstract logical primitives, being susceptible to contamination by the same world knowledge acquired during pre-training.

Problem

Research questions and friction points this paper is trying to address.

Investigates syllogistic reasoning in language models

Examines internal dynamics and circuit discovery

Assesses generalization of reasoning mechanisms across models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Circuit discovery for reasoning mechanisms

Middle-term suppression in syllogistic reasoning

Belief biases in attention heads

🔎 Similar Papers

No similar papers found.