Understanding Reasoning in Thinking Language Models via Steering Vectors

📅 2025-06-22

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Controlling the reasoning process of chain-of-thought (CoT) large language models (LLMs) remains challenging. This paper identifies, for the first time in DeepSeek-R1-Distill, that key reasoning behaviors—including uncertainty expression, verification example generation, and reasoning backtracking—are governed by separable linear directions in activation space. Building on this finding, we propose an interpretable intervention method based on steering vectors, enabling stable, targeted control across diverse model architectures. Through systematic behavioral experiments, activation-space analysis, and steering vector extraction, we achieve precise manipulation of reasoning paths across 500 tasks. Our approach demonstrates strong efficacy, consistency, and generalization across multiple CoT models—significantly enhancing both controllability and transparency of reasoning processes.

Technology Category

Application Category

📝 Abstract

Recent advances in large language models (LLMs) have led to the development of thinking language models that generate extensive internal reasoning chains before producing responses. While these models achieve improved performance, controlling their reasoning processes remains challenging. This work presents a steering approach for thinking LLMs by analyzing and manipulating specific reasoning behaviors in DeepSeek-R1-Distill models. Through a systematic experiment on 500 tasks across 10 diverse categories, we identify several reasoning behaviors exhibited by thinking models, including expressing uncertainty, generating examples for hypothesis validation, and backtracking in reasoning chains. We demonstrate that these behaviors are mediated by linear directions in the model's activation space and can be controlled using steering vectors. By extracting and applying these vectors, we provide a method to modulate specific aspects of the model's reasoning process, such as its tendency to backtrack or express uncertainty. Our approach offers practical tools for steering reasoning processes in thinking models in a controlled and interpretable manner. We validate our steering method using two DeepSeek-R1-Distill models, demonstrating consistent control across different model architectures.

Problem

Research questions and friction points this paper is trying to address.

Control reasoning processes in thinking language models

Identify and manipulate specific reasoning behaviors

Modulate reasoning aspects like backtracking and uncertainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Control reasoning via steering vectors

Manipulate linear activation directions

Modulate backtracking and uncertainty

🔎 Similar Papers

No similar papers found.

Authors to Follow