Control-R: Towards controllable test-time scaling

๐Ÿ“… 2025-05-30
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Large reasoning models (LRMs) commonly suffer from under-reasoning and over-reasoning during long chain-of-thought (L-CoT) inference. Method: This paper proposes the Reasoning Control Field (RCF) framework, which models inference as tree search and dynamically regulates reasoning depth and breadth via structured control signals injected at test time. To enable controllable knowledge transfer, we further introduce Conditional Distillation Fine-tuning (CDF), trained on the Control-R-4K dataset. Contribution/Results: To our knowledge, this is the first work achieving tunable and scalable test-time L-CoT inference. Our 32B-parameter model sets new state-of-the-art performance on AIME2024 and MATH500 benchmarks, significantly improving inference controllability, task-specific adaptation accuracy, and generalization stability.

Technology Category

Application Category

๐Ÿ“ Abstract
This paper target in addressing the challenges of underthinking and overthinking in long chain-of-thought (CoT) reasoning for Large Reasoning Models (LRMs) by introducing Reasoning Control Fields (RCF)--a novel test-time approach that injects structured control signals to guide reasoning from a tree search perspective. RCF enables models to adjust reasoning effort according to given control conditions when solving complex tasks. Additionally, we present the Control-R-4K dataset, which consists of challenging problems annotated with detailed reasoning processes and corresponding control fields. To further enhance reasoning control, we propose a Conditional Distillation Finetuning (CDF) method, which trains model--particularly Control-R-32B--to effectively adjust reasoning effort during test time. Experimental results on benchmarks such as AIME2024 and MATH500 demonstrate that our approach achieves state-of-the-art performance at the 32B scale while enabling a controllable Long CoT reasoning process (L-CoT). Overall, this work introduces an effective paradigm for controllable test-time scaling reasoning.
Problem

Research questions and friction points this paper is trying to address.

Address underthinking and overthinking in long CoT reasoning for LRMs
Introduce Reasoning Control Fields to guide test-time reasoning
Propose Conditional Distillation Finetuning to adjust reasoning effort dynamically
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Reasoning Control Fields (RCF) for guided reasoning
Presents Control-R-4K dataset with annotated reasoning processes
Proposes Conditional Distillation Finetuning (CDF) method
๐Ÿ”Ž Similar Papers
No similar papers found.
D
Di Zhang
Fudan University
W
Weida Wang
Tongji University
Junxian Li
Junxian Li
NSEC lab๏ผŒShanghai Jiaotong University
AI securityReasoningData Mining
X
Xunzhi Wang
Nankai University
Jiatong Li
Jiatong Li
PhD candidate, Hong Kong Polytechnic University
Natural Language ProcessingBioinformaticsMolecule Discovery
J
Jianbo Wu
University of California, Merced
Jingdi Lei
Jingdi Lei
PhD student, Nanyang Technological University
Vision Language ModelsLanguage Model ReasoningMachine LearningArtificial Intelligence
H
Haonan He
University of Science and Technology of China
P
Peng Ye
Shanghai Artificial Intelligence Laboratory
S
Shufei Zhang
Shanghai Artificial Intelligence Laboratory
W
Wanli Ouyang
Shanghai Artificial Intelligence Laboratory
Yuqiang Li
Yuqiang Li
Central South University
Internal Combustion EngineCombustionEmissionsMechansim
Dongzhan Zhou
Dongzhan Zhou
Researcher at Shanghai AI Lab
AI4Sciencecomputer visiondeep learning