CoT Information: Improved Sample Complexity under Chain-of-Thought Supervision

📅 2025-05-21

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This paper addresses the fundamental mismatch between the training objective (Chain-of-Thought, CoT, risk) and the test objective (end-to-end risk) under CoT supervision. We establish the first statistical learning theory framework for CoT reasoning. Central to our approach is the novel information-theoretic measure “CoT information content” ℐ_CoT(ε; ℋ), which explicitly quantifies the relationship between CoT risk and end-to-end risk. Leveraging this measure, we derive a tight sample complexity upper bound of d / ℐ_CoT(ε; ℋ), substantially improving upon the standard supervised learning bound d / ε. We further prove an information-theoretic lower bound, rigorously establishing the optimality of our upper bound. Our analysis integrates information-theoretic techniques, risk decomposition, and hypothesis class complexity modeling. This work provides the first theoretical guarantee for the improved generalization of CoT-based reasoning and identifies CoT information content as the intrinsic statistical complexity measure governing this paradigm.

Technology Category

Application Category

📝 Abstract

Learning complex functions that involve multi-step reasoning poses a significant challenge for standard supervised learning from input-output examples. Chain-of-thought (CoT) supervision, which provides intermediate reasoning steps together with the final output, has emerged as a powerful empirical technique, underpinning much of the recent progress in the reasoning capabilities of large language models. This paper develops a statistical theory of learning under CoT supervision. A key characteristic of the CoT setting, in contrast to standard supervision, is the mismatch between the training objective (CoT risk) and the test objective (end-to-end risk). A central part of our analysis, distinguished from prior work, is explicitly linking those two types of risk to achieve sharper sample complexity bounds. This is achieved via the *CoT information measure* $mathcal{I}_{mathcal{D}, h_star}^{mathrm{CoT}}(epsilon; calH)$, which quantifies the additional discriminative power gained from observing the reasoning process. The main theoretical results demonstrate how CoT supervision can yield significantly faster learning rates compared to standard E2E supervision. Specifically, it is shown that the sample complexity required to achieve a target E2E error $epsilon$ scales as $d/mathcal{I}_{mathcal{D}, h_star}^{mathrm{CoT}}(epsilon; calH)$, where $d$ is a measure of hypothesis class complexity, which can be much faster than standard $d/epsilon$ rates. Information-theoretic lower bounds in terms of the CoT information are also obtained. Together, these results suggest that CoT information is a fundamental measure of statistical complexity for learning under chain-of-thought supervision.

Problem

Research questions and friction points this paper is trying to address.

Develops statistical theory for Chain-of-Thought (CoT) supervised learning

Links CoT training risk to test risk for better bounds

Shows CoT supervision enables faster learning rates than standard methods

Innovation

Methods, ideas, or system contributions that make the work stand out.

Develops statistical theory for Chain-of-Thought learning

Introduces CoT information measure for discriminative power

Shows faster learning rates with CoT supervision

🔎 Similar Papers

From Sparse Dependence to Sparse Attention: Unveiling How Chain-of-Thought Enhances Transformer Sample Efficiency