🤖 AI Summary
This work addresses the lack of controllability in large language models’ (LLMs) mathematical reasoning processes. We propose a dynamic reasoning control method based on learnable Thought Vectors—low-dimensional, differentiable representations that encode control signals. These vectors are optimized end-to-end via entropy-minimization rewards, guiding the model to focus on critical reasoning steps without requiring external reward annotations. Empirical analysis reveals that Thought Vectors exhibit strong clustering and low-entropy distributions across distinct control conditions, confirming their semantic interpretability and structural consistency. On GSM8K, our approach achieves 90.1% accuracy with the Gemma-2-9B model and attains a controllability score of 0.42—substantially outperforming baseline methods. This work introduces a novel paradigm for controllable reasoning in LLMs, enabling precise, interpretable, and annotation-free steering of stepwise mathematical inference.
📝 Abstract
We present a novel approach for controllable mathematical reasoning that leverages self-optimizing thought vectors with entropy minimization. Our method introduces learnable thought vectors that dynamically modulate the internal reasoning process of large language models. Using Gemma-2-9B on GSM8K, we achieve 90.1% accuracy with a controllability score of 0.42, demonstrating that entropy-based rewards effectively guide focused reasoning patterns without requiring external reward annotations. Our analysis reveals distinct thought vector clusters and consistent low-entropy distributions across control conditions, validating our framework for controllable AI reasoning.