AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization

📅 2025-04-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Long-chain-of-thought (Long-CoT) reasoning improves performance on complex tasks but incurs substantial computational overhead and exhibits task-dependent efficacy, necessitating adaptive depth control. This paper proposes the first two-level adaptive reasoning framework: (1) at the group level, a difficulty-aware strategy dynamically selects between long- and short-CoT reasoning styles; (2) at the instance level, simplicity-preferring training optimizes per-step reasoning quality, integrated with hybrid CoT modeling and dynamic routing of reasoning paths. Our approach breaks away from the fixed-depth paradigm of conventional Long-CoT, enabling demand-driven inference length adjustment. Evaluated on five mathematical reasoning benchmarks, it reduces average reasoning steps by over 50%, significantly lowering computational cost, while maintaining or improving accuracy.

Technology Category

Application Category

📝 Abstract
Recently, long-thought reasoning models achieve strong performance on complex reasoning tasks, but often incur substantial inference overhead, making efficiency a critical concern. Our empirical analysis reveals that the benefit of using Long-CoT varies across problems: while some problems require elaborate reasoning, others show no improvement, or even degraded accuracy. This motivates adaptive reasoning strategies that tailor reasoning depth to the input. However, prior work primarily reduces redundancy within long reasoning paths, limiting exploration of more efficient strategies beyond the Long-CoT paradigm. To address this, we propose a novel two-stage framework for adaptive and efficient reasoning. First, we construct a hybrid reasoning model by merging long and short CoT models to enable diverse reasoning styles. Second, we apply bi-level preference training to guide the model to select suitable reasoning styles (group-level), and prefer concise and correct reasoning within each style group (instance-level). Experiments demonstrate that our method significantly reduces inference costs compared to other baseline approaches, while maintaining performance. Notably, on five mathematical datasets, the average length of reasoning is reduced by more than 50%, highlighting the potential of adaptive strategies to optimize reasoning efficiency in large language models. Our code is coming soon at https://github.com/StarDewXXX/AdaR1
Problem

Research questions and friction points this paper is trying to address.

Optimize reasoning efficiency in large language models
Adapt reasoning depth to input complexity
Reduce inference costs while maintaining performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid reasoning model combining long and short CoT
Bi-level preference training for adaptive reasoning
Reduces reasoning length by over 50% efficiently
🔎 Similar Papers
No similar papers found.
H
Haotian Luo
Sun Yat-sen University
Haiying He
Haiying He
China Agricultural University
LLMMLLMAgent
Y
Yibo Wang
Tsinghua University
J
Jinluan Yang
Zhejiang University
R
Rui Liu
Didichuxing Co. Ltd
N
Naiqiang Tan
Didichuxing Co. Ltd
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining
L
Li Shen
Sun Yat-sen University