O1-Pruner: Length-Harmonizing Fine-Tuning for O1-Like Reasoning Pruning

📅 2025-01-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the high computational cost and latency of chain-of-thought large language models (e.g., O1), this paper proposes Length-Coordinated Fine-tuning (LCF), a framework for inference efficiency optimization under accuracy constraints. Methodologically, LCF introduces a novel dynamic token budget allocation mechanism that jointly leverages pre-sampling performance estimation and reinforcement-learning–inspired constrained optimization to enable difficulty-aware step-length control. It further incorporates an accuracy-preserving sequence length minimization objective and dynamic reasoning path pruning. Evaluated on multiple mathematical reasoning benchmarks, LCF reduces average inference latency by up to 42% while improving accuracy by +1.3–2.7 percentage points. Notably, it achieves adaptive inference length compression without compromising—indeed, while enhancing—model performance, establishing a new paradigm for efficient deployment of long-thinking LLMs.

Technology Category

Application Category

📝 Abstract
Recently, long-thought reasoning LLMs, such as OpenAI's O1, adopt extended reasoning processes similar to how humans ponder over complex problems. This reasoning paradigm significantly enhances the model's problem-solving abilities and has achieved promising results. However, long-thought reasoning process leads to a substantial increase in inference time. A pressing challenge is reducing the inference overhead of long-thought LLMs while ensuring accuracy. In this paper, we experimentally demonstrate that long-thought reasoning models struggle to effectively allocate token budgets based on problem difficulty and reasoning redundancies. To address this, we propose Length-Harmonizing Fine-Tuning (O1-Pruner), aiming at minimizing reasoning overhead while maintaining accuracy. This effective fine-tuning method first estimates the LLM's baseline performance through pre-sampling and then uses RL-style fine-tuning to encourage the model to generate shorter reasoning processes under accuracy constraints. This allows the model to achieve efficient reasoning with lower redundancy while maintaining accuracy. Experiments on various mathematical reasoning benchmarks show that O1-Pruner not only significantly reduces inference overhead but also achieves higher accuracy, providing a novel and promising solution to this challenge. Our code is coming soon at https://github.com/StarDewXXX/O1-Pruner
Problem

Research questions and friction points this paper is trying to address.

O1 Model Optimization
Problem Solving Efficiency
Accuracy Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

O1-Pruner
Optimization
Efficiency Enhancement
🔎 Similar Papers
No similar papers found.
H
Haotian Luo
Shenzhen Campus of Sun Yat-sen University
L
Li Shen
Shenzhen Campus of Sun Yat-sen University
Haiying He
Haiying He
China Agricultural University
LLMMLLMAgent
Y
Yibo Wang
Tsinghua University
S
Shiwei Liu
University of Oxford
W
Wei Li
Didichuxing Co. Ltd
N
Naiqiang Tan
Didichuxing Co. Ltd
Xiaochun Cao
Xiaochun Cao
Sun Yat-sen University
Computer VisionArtificial IntelligenceMultimediaMachine Learning
Dacheng Tao
Dacheng Tao
Nanyang Technological University
artificial intelligencemachine learningcomputer visionimage processingdata mining