How Well do LLMs Compress Their Own Chain-of-Thought? A Token Complexity Approach

📅 2025-03-03

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Large language models (LLMs) suffer from verbose and inefficient chain-of-thought (CoT) reasoning, raising fundamental questions about the trade-off between reasoning length and accuracy. Method: We introduce *token complexity*—the minimal number of tokens required to solve a given task—and establish an information-theoretic compression–accuracy boundary. Through systematic cross-task and cross-model experiments, we identify a universal length–accuracy threshold phenomenon. We further conduct controlled evaluations using explicit compression instructions (e.g., word limits, punctuation removal). Contribution/Results: Existing prompt compression strategies substantially deviate from the theoretical optimum. Token complexity accurately predicts the minimal feasible reasoning length per task, providing a quantifiable benchmark for efficient inference and enabling a novel adaptive compression paradigm grounded in task-specific information requirements.

Technology Category

Application Category

📝 Abstract

Chain-of-thought prompting has emerged as a powerful technique for enabling large language models (LLMs) to solve complex reasoning tasks. However, these reasoning chains can be verbose, raising concerns about efficiency. In response, recent works have sought to decrease response lengths through simple prompting strategies (e.g. 'be concise'). In this work, we conduct the first systematic study of the relationship between reasoning length and model performance across a diverse range of compression instructions (e.g. 'use 10 words or less' or 'remove all punctuation'). In doing so, we discover a universal tradeoff between reasoning length and accuracy that persists across even very distinct reasoning chains. We demonstrate that this tradeoff emerges from a sharp threshold behavior at the question level: each task has an intrinsic 'token complexity' - a minimal number of tokens required for successful problem-solving. We show how token complexity enables us to compute information-theoretic limits on the accuracy-compression tradeoff, and find that prompt-based compression strategies operate far from these theoretical limits. This suggests there may be significant room for improvement and our framework provides a benchmark to help researchers evaluate progress in reasoning efficiency. Our work also highlights the importance of adaptive compression -- giving shorter responses for easier questions -- and we show that token complexity is a useful tool for measuring this capability.

Problem

Research questions and friction points this paper is trying to address.

Explores relationship between reasoning length and model performance.

Identifies intrinsic token complexity for problem-solving in LLMs.

Evaluates efficiency of prompt-based compression strategies in reasoning tasks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Systematic study of reasoning length-performance relationship

Introduction of token complexity for problem-solving efficiency

Framework for evaluating adaptive compression strategies

🔎 Similar Papers

Position IDs Matter: An Enhanced Position Layout for Efficient Context Compression in Large Language Models