ThinkPilot: Steering Reasoning Models via Automated Think-prefixes Optimization

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Large reasoning models (LRMs) suffer from low inference efficiency and frequent goal divergence; existing training-free methods are constrained by rigid heuristics or intractable analytical requirements. This paper introduces the first training-free reasoning optimization framework, which synergistically integrates evolutionary algorithms with a fine-grained taxonomy of reasoning behaviors to automatically discover and optimize *think-prefixes*—structured prompts that precisely steer reasoning trajectories toward desired goals. The method requires no model fine-tuning, is plug-and-play, and exhibits strong task adaptability and cross-task generalization. Evaluated on DeepSeek-R1-Distill-Qwen-32B, it reduces the StrongREJECT rate from 27.0% to 0.7%, while simultaneously improving inference efficiency, instruction adherence, and safety—effectively alleviating the accuracy–reasoning-length trade-off.

Technology Category

Application Category

📝 Abstract

Large Reasoning Models (LRMs) are powerful, but they still suffer from inefficient and off-target reasoning. Currently, training-free methods are limited to either rigid heuristics or descriptive, non-actionable analyses. In this paper, we introduce ThinkPilot, a training-free framework that automatically optimizes LRMs reasoning. It uses an evolutionary process to generate think-prefixes, which are instructions that evolve driven by a taxonomy of reasoning behaviors to guide models toward superior performance. Extensive experiments demonstrate ThinkPilot's broad effectiveness: it significantly improves the accuracy-length trade-off for efficient reasoning, drastically improves safety (for example, cutting the StrongREJECT score of DeepSeek-R1-Distill-Qwen-32B from 27.0% to 0.7), and enhances instruction following. It also synergizes with existing training-based methods. Our analysis reveals that think-prefixes can reliably control LRMs' reasoning behaviors, and that different tasks have strong preferences for specific behavioral distributions. By automatically identifying and eliciting these behaviors, ThinkPilot provides a generalizable framework for aligning LRMs reasoning with task demands. Data and code are available at https://github.com/teqkilla/ThinkPilot

Problem

Research questions and friction points this paper is trying to address.

Optimizes Large Reasoning Models' inefficient and off-target reasoning behaviors

Automatically generates think-prefixes to guide models toward superior performance

Improves reasoning accuracy, safety, and instruction following capabilities

Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically optimizes reasoning models without training

Uses evolutionary process to generate think-prefixes

Controls reasoning behaviors through behavioral taxonomy

🔎 Similar Papers

No similar papers found.

Authors to Follow