🤖 AI Summary
This work investigates whether the deep recurrent Transformer Huginn-3.5B spontaneously develops latent Chain-of-Thought (CoT) structures—i.e., implicit, stepwise reasoning in latent space—without explicit natural language reasoning steps.
Method: Using interpretability probes—including Logit Lens and Coda Lens—we track the rank evolution of intermediate results and final answer tokens across hidden states, systematically analyzing representational consistency across recurrent blocks and layer-wise interpretability differences.
Contribution/Results: We find weak empirical evidence for latent CoT formation; interpretability is highly dependent on layer index and decoding strategy; and increasing recurrence depth yields only marginal gains in mathematical and multi-step reasoning performance. This study provides the first empirical demonstration—in a deep recurrent architecture—of structural limitations in implicit reasoning trajectories within latent space. Our findings constitute critical negative evidence for latent-space reasoning modeling and establish a methodological benchmark for future research on implicit reasoning in recurrent Transformers.
📝 Abstract
Chain-of-thought (CoT) reasoning has enabled transformer-based language models to excel at complex mathematics and multi-step planning. However, in standard decoder-only architectures, these reasoning steps are externalized in natural language, improving interpretability at the cost of efficiency. To capture reasoning that is not easily represented in words, many works have explored recurrent architectures that aim to internalize reasoning in latent space, potentially supporting latent CoT. In this paper, we investigate whether such reasoning structures emerge in Huginn-3.5B, a depth-recurrent Transformer that reuses layers at inference time without increasing parameter count. We examine the model's internal behavior on arithmetic tasks using a suite of probing techniques including the Logit Lens and Coda Lens. Our findings reveal limited evidence of interpretable latent CoT by tracking rank trajectories of final and intermediate result tokens. Furthermore, we uncover significant probing inconsistencies across recurrent blocks, where the interpretability of hidden states depends heavily on both the layer index and the decoding method. Finally, we empirically show that increasing recurrence depth yields only marginal gains and falls well short of models that explicitly externalize reasoning steps. The code is available at https://github.com/wenquanlu/huginn-latent-cot.