Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

140K/year

🤖 AI Summary

Current language models often resort to simple retries upon reasoning failures, overlooking potentially repairable information embedded in failed reasoning trajectories. This work pioneers the treatment of failed reasoning trajectories as diagnostic objects and introduces three distributional features tied to intervention structure, enabling test-time identification—without any additional training—of failure types (stochastic versus structural). Based on this diagnosis, the method routes each failure to an appropriate repair strategy. Validated through cross-model probing, the approach achieves 84.3±4.3% accuracy in failure classification, a 20% improvement over baselines, and boosts intervention success rates by 12.2% on the Steerable-Hard subset, demonstrating an efficient and generalizable framework for failure analysis and repair.

📝 Abstract

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions can rescue a given failure. Three problem-level trajectory features, derived from the structure of available interventions, recover this structure from the distributional signature of failed rollouts, not their text. They cluster failures into stable regimes, characterize the failure topography of different post-training methods ($84.3{\pm}4.3\%$ accuracy, $+20\%$ over a majority-class baseline), and support a training-free routing rule that lifts rescue by $+12.2\%$ on the deployment-relevant Steerable-Hard subset (failures where retry is insufficient and a bounded intervention is reachable). The features and the routing rule transfer across two cross-family probes. The same three features thus convert failed traces from discarded data into a diagnostic object, supporting test-time routing and post-training analysis without training-time or weight-space access.

Problem

Research questions and friction points this paper is trying to address.

reasoning failures

failed traces

test-time intervention

recoverability

post-trained language models

Innovation

Methods, ideas, or system contributions that make the work stand out.

failed reasoning traces

recoverability structure

test-time routing