Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

📅 2026-05-29
📈 Citations: 0
Influential: 0
📄 PDF

career value

206K/year
🤖 AI Summary
This work addresses the performance degradation of autoregressive language models during long-sequence generation—manifested as repetition, diminished instruction-following ability, and unstable entropy—for which real-time diagnostic tools are lacking. The authors formalize this phenomenon as “cognitive fatigue” and model it through three dimensions: attention decay, representational drift, and entropy calibration bias. They propose a lightweight, model-agnostic Fatigue Index (FI) that satisfies axiomatic properties of monotonicity, boundedness, and interpretability, enabling online monitoring. Evaluated across nine models ranging from 1B to 13B parameters, FI effectively predicts task performance degradation (AUROC = 0.95) and textual repetition (Spearman ρ = 0.94). The analysis further reveals a non-monotonic relationship between model scale and fatigue dynamics: instruction-tuned models below 3B parameters exhibit accelerated fatigue, whereas those above 7B show significant mitigation.
📝 Abstract
Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real-time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axioms (monotonicity, boundedness, interpretability) enabling reliable runtime monitoring. Across nine models (1B-13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (Spearman rho = 0.94), and reveal non-monotonic scaling behavior: instruction-tuned models below 3B exhibit faster collapse than base models, with this trend reversing at 7B. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.
Problem

Research questions and friction points this paper is trying to address.

cognitive fatigue
autoregressive transformers
generation degradation
real-time diagnostics
language models
Innovation

Methods, ideas, or system contributions that make the work stand out.

cognitive fatigue
Fatigue Index
autoregressive transformers
runtime monitoring
representational drift