On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

📅 2024-06-20

🏛️ Annual Meeting of the Association for Computational Linguistics

📈 Citations: 9

✨ Influential: 0

career value

191K/year

🤖 AI Summary

Prior work treats chain-of-thought (CoT) reasoning as merely augmenting computational steps; its impact on the representational capacity of neural language models remains theoretically uncharacterized. Method: We formalize CoT reasoning within a probabilistic modeling framework, analyzing recurrent neural networks and Transformers augmented with CoT under probabilistic Turing machine semantics. Contribution/Results: We prove that CoT-augmented models can exactly represent the string distribution generated by any probabilistic Turing machine—establishing probabilistic Turing completeness. This refutes the categorical error of equating language models with deterministic Turing machines and reveals CoT’s core function: expanding expressivity over complex stochastic languages. By integrating probabilistic language modeling, formal language theory, and computability analysis, our work provides the first rigorous theoretical characterization of CoT’s representational power and fundamental capability boundaries.

Technology Category

Application Category

📝 Abstract

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM's computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error - Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

Problem

Research questions and friction points this paper is trying to address.

Chain-of-Thought Reasoning

Neural Language Models

Computational Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Chain-of-Thought

Neural Language Models

Enhanced Predictive Capabilities

🔎 Similar Papers

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency