When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

📅 2026-06-04

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the inefficiency of large language models that often resort to computationally expensive deep reasoning indiscriminately, leading to unnecessary resource consumption. To mitigate this, the authors propose the Intuition-Deep Reasoning (IDPR) framework, which first generates a rapid “intuitive” answer and then employs an inhibition controller to dynamically decide—based on the intuitive answer and its supporting evidence (e.g., confidence scores and logit gaps)—whether to invoke costly slow-path reasoning. The framework introduces a novel response-conditioned inhibition mechanism that outperforms input-only routing strategies by more accurately identifying samples requiring deep reasoning. Evaluated on a 5,000-question mathematical reasoning benchmark, IDPR activates slow-path reasoning for only 8.20% of samples, improving overall accuracy from 47.90% to 48.92%, significantly surpassing random and confidence-based baselines while achieving the highest correction precision.

📝 Abstract

Reasoning Large Language Models can improve problem-solving performance through deliberative inference, but invoking slow reasoning for every input is computationally expensive and often unnecessary. We propose IDPR, a framework for response-conditioned inhibitory deliberation. IDPR first generates a concise intuitive answer and then uses an inhibition controller to decide whether that specific response should be released or suppressed in favor of slow reasoning. Unlike input-only routers, the inhibition controller conditions on the fast answer and fast-side evidence, including confidence, logit margin, parseability, and generation cost. We train the controller from paired fast-slow outcomes and select the inhibition threshold on a held-out validation set under an accuracy-first slow-call budget. On a held-out 5,000-example mathematical reasoning test set, IDPR invokes slow reasoning on only 8.20% of examples and improves accuracy from 47.90% to 48.92%. Under the same slow-call budget, random routing decreases accuracy to 46.76%, while the strongest confidence-based baseline reaches 48.22%. IDPR also achieves the highest corrective precision, showing that response-conditioned inhibition better identifies fast answers that benefit from slow reasoning.

Problem

Research questions and friction points this paper is trying to address.

deliberative inference

inhibitory deliberation

reasoning efficiency

slow reasoning

fast-slow routing

Innovation

Methods, ideas, or system contributions that make the work stand out.

inhibitory deliberation

response-conditioned routing

fast-slow reasoning