Controllable Mathematical Reasoning via Self-Optimizing Thought Vectors

📅 2025-10-24

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the lack of controllability in large language models’ (LLMs) mathematical reasoning processes. We propose a dynamic reasoning control method based on learnable Thought Vectors—low-dimensional, differentiable representations that encode control signals. These vectors are optimized end-to-end via entropy-minimization rewards, guiding the model to focus on critical reasoning steps without requiring external reward annotations. Empirical analysis reveals that Thought Vectors exhibit strong clustering and low-entropy distributions across distinct control conditions, confirming their semantic interpretability and structural consistency. On GSM8K, our approach achieves 90.1% accuracy with the Gemma-2-9B model and attains a controllability score of 0.42—substantially outperforming baseline methods. This work introduces a novel paradigm for controllable reasoning in LLMs, enabling precise, interpretable, and annotation-free steering of stepwise mathematical inference.

Technology Category

Application Category

📝 Abstract

We present a novel approach for controllable mathematical reasoning that leverages self-optimizing thought vectors with entropy minimization. Our method introduces learnable thought vectors that dynamically modulate the internal reasoning process of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1% accuracy with a controllability score of 0.42, demonstrating that entropy-based rewards effectively guide focused reasoning patterns without requiring external reward annotations. Our analysis reveals distinct thought vector clusters and consistent low-entropy distributions across control conditions, validating our framework for controllable AI reasoning.

Problem

Research questions and friction points this paper is trying to address.

Enhancing mathematical reasoning controllability via thought vectors

Optimizing reasoning patterns through entropy minimization techniques

Achieving high accuracy without external reward annotations

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-optimizing thought vectors modulate reasoning process

Entropy minimization guides reasoning without external rewards

Learnable vectors create distinct clusters for control

🔎 Similar Papers

No similar papers found.

Authors to Follow