🤖 AI Summary
Large language models (LLMs) often suffer from verbose and inefficient chain-of-thought (CoT) reasoning due to excessive deliberation; existing approaches—such as multi-path distillation or preference learning—are prone to overfitting and heavily reliant on high-quality synthetic data. Method: We propose a stepwise reasoning compression framework that employs long–short switching sampling to generate diverse reasoning trajectories, constructs dual-objective preference pairs (accuracy vs. length), trains separate high-accuracy and short-length models, and combines them via parameter interpolation to yield a balanced model. Contribution/Results: Our method is the first to decouple accuracy and length optimization, eliminating dependence on curated synthetic data and mitigating overfitting. Experiments across multiple mathematical reasoning benchmarks show 30–50% reduction in reasoning length while maintaining or improving accuracy, with consistent performance across diverse backbone architectures. Code and data are publicly released.
📝 Abstract
Recent advances in Chain-of-Thought (CoT) prompting have substantially improved the reasoning capabilities of Large Language Models (LLMs). However, these methods often suffer from overthinking, leading to unnecessarily lengthy or redundant reasoning traces. Existing approaches attempt to mitigate this issue through curating multiple reasoning chains for training LLMs, but their effectiveness is often constrained by the quality of the generated data and prone to overfitting. To address the challenge, we propose Reasoning Compression ThroUgh Stepwise Trials (ReCUT), a novel method aimed at balancing the accuracy and length of reasoning trajectory. Specifically, ReCUT employs a stepwise exploration mechanism and a long-short switched sampling strategy, enabling LLMs to incrementally generate diverse reasoning paths. These paths are evaluated and used to construct preference pairs to train two specialized models (Gemini LLMs)-one optimized for reasoning accuracy, the other for shorter reasoning. A final integrated model is obtained by interpolating the parameters of these two models. Experimental results across multiple math reasoning datasets and backbone models demonstrate that ReCUT significantly reduces reasoning lengths by approximately 30-50%, while maintaining or improving reasoning accuracy compared to various baselines. All codes and data will be released via https://github.com/NEUIR/ReCUT.