The Periodic Table of LLM Reasoning: A Structured Survey of Reasoning Paradigms, Methods, and Failure Modes

📅 2026-06-09

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large language models (LLMs) exhibit instability in complex reasoning tasks, are highly sensitive to prompting strategies, and show notable limitations in multi-step reasoning and cross-domain generalization. This work presents a systematic review of over 300 studies and introduces the first comprehensive taxonomy of LLM reasoning paradigms, encompassing key directions such as chain-of-thought, multi-hop, mathematical, and commonsense reasoning. The framework integrates technical dimensions including prompt engineering, model architecture, training objectives, reward modeling, and evaluation benchmarks. Through bibliometric and qualitative synthesis, the study systematically characterizes common failure modes, traces methodological evolution, and highlights emerging frontiers—including meta-reasoning and self-evolving frameworks—thereby laying the groundwork for more robust, interpretable, and generalizable reasoning systems.

📝 Abstract

Large Language Models (LLMs) have achieved strong performance across natural language processing tasks, yet reliable reasoning remains an open challenge. Although modern LLMs show progress in structured inference, multi-step problem solving, and contextual understanding, their reasoning behavior is often inconsistent and sensitive to prompting strategies, task design, and model scale. This survey provides a systematic analysis of more than 300 recent papers from arXiv, Semantic Scholar, Google Scholar, Papers with Code, and the ACL Anthology to examine how reasoning capabilities emerge in LLMs and where they fail. We make three main contributions. First, we introduce a structured taxonomy of LLM reasoning research, covering Chain-of-Thought reasoning, multi-hop reasoning, mathematical reasoning, common sense reasoning, visual and temporal reasoning, code and algorithmic reasoning, retrieval-augmented reasoning, tool-augmented and agentic reasoning, and reinforcement learning-based reasoning. Second, we analyze methodological trends across these paradigms, including prompting methods, model architectures, training objectives, reward modeling, and evaluation benchmarks. Third, we synthesize recurring limitations and failure modes, such as reasoning hallucinations, brittle multi-step inference, weak causal abstraction, and poor cross-domain generalization. By organizing a rapidly expanding literature, this survey offers a unified view of the current capabilities and limitations of reasoning in LLMs. We also identify emerging research directions, including meta-reasoning, self-evolving reasoning frameworks, multimodal reasoning, and socially grounded reasoning. Overall, this work aims to serve as a reference for developing more robust, interpretable, and generalizable reasoning systems in future language models.

Problem

Research questions and friction points this paper is trying to address.

LLM reasoning

reasoning failure modes

multi-step inference

cross-domain generalization

reasoning hallucinations

Innovation

Methods, ideas, or system contributions that make the work stand out.

structured taxonomy

reasoning paradigms

failure modes