Towards Interpretable and Inference-Optimal COT Reasoning with Sparse Autoencoder-Guided Generation

📅 2025-10-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the problem of uncontrollable reasoning paths and imbalanced exploitation-exploration trade-offs in large language models (LLMs) for mathematical reasoning. Methodologically, it introduces a representation-guided chain-of-thought generation framework: sparse autoencoders (SAEs) first extract salient neural activation features; k-means clustering then constructs a reasoning state graph; an edge-weight reward function is defined to quantify the exploitation-exploration balance along reasoning trajectories, enabling controllable decoding via graph-guided generation. The key contribution lies in the first integration of representation learning with graph-structured modeling to explicitly regulate both diversity and consistency of reasoning paths. Experiments demonstrate significant accuracy improvements across multiple mathematical reasoning benchmarks, alongside enhanced interpretability and robustness of the reasoning process—effectively mitigating biases arising from over-exploitation or over-exploration.

Technology Category

Application Category

📝 Abstract
We propose a novel method that leverages sparse autoencoders (SAEs) and clustering techniques to analyze the internal token representations of large language models (LLMs) and guide generations in mathematical reasoning tasks. Our approach first trains an SAE to generate sparse vector representations for training tokens, then applies k-means clustering to construct a graph where vertices represent token clusters and weighted edges capture sequential token transitions. Using this graph, we define an edge-weight based reward function to quantify adherence to established reasoning traces, thereby identifying exploitative reasoning trajectories. Additionally, we measure generation diversity from clustering to assess the extent of exploration. Our findings indicate that balancing both exploitation and exploration is crucial for achieving high accuracy in mathematical reasoning tasks. During generation, the SAE can serve as a scalable reward model to guide generations, ensuring a balanced trade-off between exploitation and exploration. This prevents extreme behaviors in either direction, ultimately fostering a higher-quality reasoning process in LLMs.
Problem

Research questions and friction points this paper is trying to address.

Analyzing token representations to guide mathematical reasoning
Balancing exploitation and exploration for reasoning accuracy
Preventing extreme behaviors in language model reasoning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Sparse autoencoders guide token representation analysis
Clustering constructs graph to model token transitions
Edge-weight reward balances exploitation and exploration
🔎 Similar Papers
No similar papers found.
D
Daniel Zhao
University of California, San Diego
A
Abhilash Shankarampeta
University of California, San Diego
Lanxiang Hu
Lanxiang Hu
University of California, San Diego
Machine LearningDistributed SystemsEmbedded Systems
Tajana Rosing
Tajana Rosing
Distinguished Professor, UCSD
computer architecturecyber-physical systemssystem energy efficiency
H
Hao Zhang
University of California, San Diego