π€ AI Summary
Large reasoning models (LRMs) frequently over-reason on simple tasks, leading to unnecessary computational overhead. To address this, we propose a complexity-adaptive dynamic chain-of-thought (CoT) switching framework that enables a single model to automatically select between short- and long-chain reasoning paths based on task difficulty. Our core contribution is a lightweight, end-to-end trainable switching module that makes real-time path decisions using dual-mode relative performance signalsβsuch as confidence differentials and intermediate-step time ratios. The method integrates prompt engineering with supervised switching training, requiring no architectural modifications to the base model. Evaluated on multiple reasoning benchmarks (GSM8K, MMLU, HotpotQA), our framework reduces average inference computation by 20β30% while preserving original accuracy on complex tasks. This yields substantial improvements in both inference efficiency and generalization across diverse reasoning challenges.
π Abstract
Large reasoning models (LRMs) excel at solving complex tasks by leveraging long chain-of-thought (CoT) reasoning. However, this often leads to overthinking on simple tasks, resulting in unnecessary computational overhead. We observe that LRMs inherently possess the capability for efficient short CoT reasoning, which can be reliably elicited through prompt design. To leverage this capability, we propose ThinkSwitcher, a framework that enables a single LRM to dynamically switch between short and long CoT modes based on task complexity. ThinkSwitcher introduces a lightweight switching module trained with supervision signals derived from the relative performance of each reasoning mode across tasks. Experiments on multiple reasoning benchmarks show that ThinkSwitcher reduces computational cost by 20-30% while maintaining high accuracy on complex tasks. This demonstrates the effectiveness of ThinkSwitcher as a scalable and efficient solution for unified LRM deployment.