Why Does Your CoT Prompt (Not) Work? Theoretical Analysis of Prompt Space Complexity, its Interaction with Answer Space During CoT Reasoning with LLMs: A Recurrent Perspective

📅 2025-03-13

📈 Citations: 0

✨ Influential: 0

career value

178K/year

🤖 AI Summary

Current chain-of-thought (CoT) reasoning relies on generic, one-size-fits-all prompting (e.g., “think step-by-step”), lacking theoretical grounding—leading to inefficient search in high-dimensional prompt spaces and degraded computational tractability. Method: Grounded in recursion theory, the authors establish the first formal link between prompt-space complexity and navigational efficacy over answer spaces; they rigorously prove that universal prompts impair LLM computability on high-complexity tasks, whereas task-specific prompts enjoy provable theoretical advantages—and underscore the indispensable role of human supervision in efficient space navigation. Contribution/Results: The proposed task-adaptive prompting framework achieves an average 19.7% improvement in reasoning accuracy, significantly outperforming Tree-of-Thought (ToT), Graph-of-Thought (GoT), and unsupervised prompt-generation methods.

Technology Category

Application Category

📝 Abstract

Despite the remarkable successes of Large Language Models (LLMs), their fundamental Transformer architecture possesses inherent theoretical limitations that restrict their capability to handle reasoning tasks with increasing computational complexity. Chain-of-Thought (CoT) prompting has emerged as a practical solution, supported by several theoretical studies. However, current CoT-based methods (including ToT, GoT, etc.) generally adopt a"one-prompt-fits-all"strategy, using fixed templates (e.g.,"think step by step") across diverse reasoning tasks. This method forces models to navigate an extremely complex prompt space to identify effective reasoning paths. The current prompt designing research are also heavily relying on trial-and-error rather than theoretically informed guidance. In this paper, we provide a rigorous theoretical analysis of the complexity and interplay between two crucial spaces: the prompt space (the space of potential prompt structures) and the answer space (the space of reasoning solutions generated by LLMs) in CoT reasoning. We demonstrate how reliance on a single universal prompt (e.g. think step by step) can negatively impact the theoretical computability of LLMs, illustrating that prompt complexity directly influences the structure and effectiveness of the navigation in answer space. Our analysis highlights that sometimes human supervision is critical for efficiently navigating the prompt space. We theoretically and empirically show that task-specific prompting significantly outperforms unsupervised prompt generation, emphasizing the necessity of thoughtful human guidance in CoT prompting.

Problem

Research questions and friction points this paper is trying to address.

Analyzes complexity of prompt and answer spaces in CoT reasoning.

Explores limitations of universal prompts in LLM reasoning tasks.

Demonstrates need for task-specific, human-guided CoT prompting strategies.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Analyzes prompt and answer space complexity.

Advocates task-specific over universal prompting.

Highlights human supervision in prompt design.

🔎 Similar Papers

No similar papers found.