🤖 AI Summary
This work addresses the limited quality of reasoning trajectories generated by recursive models in structured reasoning tasks by proposing a training-free guided stochastic exploration framework. Treating recursive reasoning as approximate inference over implicit trajectories, the method generates neighboring trajectories through stochastic perturbations and dynamically reweights them using the model’s built-in early-stopping head. The study introduces three novel unsupervised diagnostic metrics—local stability, guidance alignment, and cloud token entropy—that enable prediction of method efficacy and result reliability solely from reasoning trajectories. Evaluated on Sudoku-Extreme, the approach boosts solution accuracy from 85.9% to 98.0%, and on Maze-Hard, it successfully identifies guidance misalignment issues, with predictions strongly corroborated by subsequent performance validation.
📝 Abstract
Recent work on recursive architectures has shown that tiny neural networks can be surprisingly powerful on structured reasoning tasks. The trick is to model reasoning trajectories with a latent dynamical system. We argue that the inference-time behaviour of these architectures is best understood as approximate inference over latent reasoning trajectories, with deterministic recursion as the one-particle, zero-noise limit. We make this view operational through guided stochastic exploration: stochastic perturbations of the reasoning dynamics propose neighbouring trajectories, and the model's existing early-stopping head reweights them online. The framework yields three label-free diagnostics: local stability, guide alignment, and cloud-token entropy. These predict, from inference traces alone, whether the procedure will help and which of its outputs to trust. On Sudoku-Extreme it lifts exact-solve accuracy from $85.9\%$ to $98.0\%$ without retraining; on Maze-Hard the diagnostics flag a misaligned guide, as validation performance later confirms. The same machinery thus characterises both when recursive reasoning has room to improve at the trajectory level and when the model's internal guide can recover it.