Recurrent Confidence Chain: Temporal-Aware Uncertainty Quantification in Large Language Models

📅 2026-01-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the overconfidence of large language models in reasoning, which often stems from neglecting early low-confidence steps and leads to hallucinations. To mitigate this, the paper introduces a time-aware confidence propagation mechanism and proposes a recursive chain-of-confidence architecture. This approach leverages cross-step attention to model semantic dependencies among reasoning steps and integrates hidden confidence states with a confidence fusion strategy to effectively capture both decay and accumulation of confidence over long reasoning chains. Evaluated on the GAOKAO-Math and CLadder causal reasoning benchmarks, the method significantly outperforms existing approaches, achieving superior predictive accuracy and better uncertainty calibration as measured by negative log-likelihood and expected calibration error.

Technology Category

Application Category

📝 Abstract
As reasoning modules, such as the chain-of-thought mechanism, are applied to large language models, they achieve strong performance on various tasks such as answering common-sense questions and solving math problems. The main challenge now is to assess the uncertainty of answers, which can help prevent misleading or serious hallucinations for users. Although current methods analyze long reasoning sequences by filtering unrelated tokens and examining potential connections between nearby tokens or sentences, the temporal spread of confidence is often overlooked. This oversight can lead to inflated overall confidence, even when earlier steps exhibit very low confidence. To address this issue, we propose a novel method that incorporates inter-step attention to analyze semantic correlations across steps. For handling long-horizon responses, we introduce a hidden confidence mechanism to retain historical confidence information, which is then combined with stepwise confidence to produce a more accurate overall estimate. We evaluate our method on the GAOKAO math benchmark and the CLadder causal reasoning dataset using mainstream open-source large language models. Our approach is shown to outperform state-of-the-art methods by achieving a superior balance between predictive quality and calibration, demonstrated by strong performance on both Negative Log-Likelihood and Expected Calibration Error.
Problem

Research questions and friction points this paper is trying to address.

uncertainty quantification
large language models
temporal confidence
hallucination
reasoning chains
Innovation

Methods, ideas, or system contributions that make the work stand out.

Temporal-aware uncertainty quantification
Inter-step attention
Hidden confidence mechanism
Confidence calibration
Chain-of-thought reasoning
Zhenjiang Mao
Zhenjiang Mao
University of Florida
A
Anirudhh Venkat
University of Florida