Analyzing the Power of Chain of Thought through Memorization Capabilities

📅 2025-11-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates whether chain-of-thought (CoT) prompting universally enhances the reasoning capabilities of Transformer models, positing that reasoning is fundamentally a memory problem. Method: We systematically characterize the memory capacity of fixed-precision Transformers on both finite and infinite reasoning datasets, via parameter complexity analysis, modeling of necessary and sufficient conditions for memorization, and rigorous derivation of upper and lower bounds—validated on mathematical reasoning tasks. Contribution/Results: We show that CoT does not uniformly improve performance across all tasks; both CoT and non-CoT models exhibit identical asymptotic memory parameter complexity, Θ̃(N), and neither subsumes the other in capability. We identify tasks where CoT fails to enhance reasoning and prove that certain infinite datasets are inherently unmemorizable by any fixed-precision Transformer. Our results delineate precise theoretical boundaries of CoT effectiveness and offer a novel memory-centric perspective on reasoning mechanisms in foundation models.

Technology Category

Application Category

📝 Abstract
It has been shown that the chain of thought (CoT) can enhance the power of large language models (LLMs) to solve certain mathematical reasoning problems. However, the capacity of CoT is still not fully explored. As an important instance, the following basic question has not yet been answered: Does CoT expand the capability of transformers across all reasoning tasks? We demonstrate that reasoning with transformers is essentially a memorization problem for reasoning datasets. Thus, examining the power of CoT across all reasoning tasks amounts to analyzing the memorization capabilities of CoT transformers. In this paper, we give a complete description of the memorization capabilities of fixed-precision transformers with or without CoT and give a negative answer to the above-mentioned question. Precisely, we first give necessary and sufficient conditions for fixed-precision transformers with and without CoT to memorize a finite reasoning dataset and show that these two conditions do not imply each other. Then, we give lower and upper bounds for the number of parameters needed for transformers with or without CoT to memorize a finite reasoning dataset with $N$ elements, which are $overlineΘ(N)$ in all cases. This implies that there exist reasoning tasks for which CoT does not enhance the reasoning power of transformers, leading to a negative answer to the above-mentioned question. Finally, we give the first results on memorizing infinite reasoning datasets by CoT transformers and show that some simple infinite datasets cannot be memorized by transformers with or without CoT.
Problem

Research questions and friction points this paper is trying to address.

Examining whether Chain of Thought expands transformer reasoning capabilities universally
Analyzing memorization capabilities of transformers with and without Chain of Thought
Determining parameter requirements for memorizing finite and infinite reasoning datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzing CoT memorization capabilities in transformers
Comparing parameter bounds for transformers with and without CoT
Demonstrating CoT does not enhance all reasoning tasks
L
Lijia Yu
Institute of AI for Industries, Nanjing, China
Xiao-Shan Gao
Xiao-Shan Gao
AMSS, CAS
Automated ReasoningSymbolic ComputationMachine Learning Theory
L
Lijun Zhang
Institute of AI for Industries, Nanjing, China; University of Chinese Academy of Sciences; Key Laboratory of System Software of Chinese Academy of Sciences