Control-R: Towards controllable test-time scaling

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large reasoning models (LRMs) commonly suffer from under-reasoning and over-reasoning during long chain-of-thought (L-CoT) inference. Method: This paper proposes the Reasoning Control Field (RCF) framework, which models inference as tree search and dynamically regulates reasoning depth and breadth via structured control signals injected at test time. To enable controllable knowledge transfer, we further introduce Conditional Distillation Fine-tuning (CDF), trained on the Control-R-4K dataset. Contribution/Results: To our knowledge, this is the first work achieving tunable and scalable test-time L-CoT inference. Our 32B-parameter model sets new state-of-the-art performance on AIME2024 and MATH500 benchmarks, significantly improving inference controllability, task-specific adaptation accuracy, and generalization stability.

Technology Category

Application Category

📝 Abstract

This paper target in addressing the challenges of underthinking and overthinking in long chain-of-thought (CoT) reasoning for Large Reasoning Models (LRMs) by introducing Reasoning Control Fields (RCF)--a novel test-time approach that injects structured control signals to guide reasoning from a tree search perspective. RCF enables models to adjust reasoning effort according to given control conditions when solving complex tasks. Additionally, we present the Control-R-4K dataset, which consists of challenging problems annotated with detailed reasoning processes and corresponding control fields. To further enhance reasoning control, we propose a Conditional Distillation Finetuning (CDF) method, which trains model--particularly Control-R-32B--to effectively adjust reasoning effort during test time. Experimental results on benchmarks such as AIME2024 and MATH500 demonstrate that our approach achieves state-of-the-art performance at the 32B scale while enabling a controllable Long CoT reasoning process (L-CoT). Overall, this work introduces an effective paradigm for controllable test-time scaling reasoning.

Problem

Research questions and friction points this paper is trying to address.

Address underthinking and overthinking in long CoT reasoning for LRMs

Introduce Reasoning Control Fields to guide test-time reasoning

Propose Conditional Distillation Finetuning to adjust reasoning effort dynamically

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Reasoning Control Fields (RCF) for guided reasoning

Presents Control-R-4K dataset with annotated reasoning processes

Proposes Conditional Distillation Finetuning (CDF) method

🔎 Similar Papers

No similar papers found.

Authors to Follow