How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

๐Ÿ“… 2026-06-04
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work identifies two distinct failure modes in large language model reasoning: decisive failures and persistently uncertain failures. It proposes the first fine-grained diagnostic method based on token-level uncertainty signals to detect verifiable signatures of these failure types within reasoning trajectories, thereby revealing the dynamic boundary of failure detectability. By constructing a cross-model and cross-dataset validation framework, the study reproduces these failure signatures across 23 configurations, with 20 showing statistically significant alignment with predictionsโ€”far exceeding random chance. Leveraging these insights, the authors adaptively refine self-consistency strategies, substantially improving failure detection performance.
๐Ÿ“ Abstract
Failures in language model reasoning emerge through distinct processes that leave identifiable signatures in the reasoning trace. We characterize these failures using token-level uncertainty signals, finding they arise through two empirically distinguishable processes. The first is committed failure, in which a model locks onto an incorrect reasoning path early in its trace. A central diagnostic signature is the commitment point, beyond which considering additional tokens hurt rather than help failure detection. In the second, persistent uncertainty, uncertainty instead accumulates throughout, and the full trace is needed to best distinguish failing from successful completions. These signatures reproduce across 23 model-dataset configurations, with the framework's falsifiable predictions holding in 20 of 23 cases, well above chance across both failure modes. Finally, we demonstrate our failure mode framework has direct implications for self-consistency, identifying when uncertainty signals complement it and when it can be selectively skipped. These results offer a foundation for understanding when LLM reasoning failures become detectable and for adapting detection strategies accordingly.
Problem

Research questions and friction points this paper is trying to address.

language models
reasoning failures
token-level uncertainty
committed failure
persistent uncertainty
Innovation

Methods, ideas, or system contributions that make the work stand out.

token-level uncertainty
committed failure
persistent uncertainty
reasoning trace
failure detection
๐Ÿ”Ž Similar Papers