A Probabilistic Inference Scaling Theory for LLM Self-Correction

📅 2025-08-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Prior work has not characterized how accuracy dynamically evolves during multi-turn self-correction in large language models (LLMs). Method: We propose the first probabilistic reasoning framework to formally model the convergence behavior of accuracy during self-correction, enabling prediction of the full performance improvement curve from a single correction step. Our approach integrates Bayesian inference with rigorous mathematical derivation. Contributions/Results: We systematically validate the framework across major LLMs—including GPT-4, Claude, and Llama—and diverse benchmarks (GSM8K, MMLU, HumanEval). Empirical results align closely with theoretical predictions (mean absolute error <2.1%), revealing for the first time an exponential convergence pattern in LLM self-correction. This provides a principled, interpretable, and quantifiable analytical paradigm for trustworthy AI—enabling both diagnostic insight and optimization guidance for iterative refinement strategies.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) have demonstrated the capability to refine their generated answers through self-correction, enabling continuous performance improvement over multiple rounds. However, the mechanisms underlying how and why accuracy evolves during this iterative process remain unexplored. To fill this gap, we propose a probabilistic theory to model the dynamics of accuracy change and explain the performance improvements observed in multi-round self-correction. Through mathematical derivation, we establish that the accuracy after the $t^{th}$ round of self-correction is given by: $Acc_t = Upp - α^t(Upp - Acc_0),$ where $Acc_0$ denotes the initial accuracy, $Upp$ represents the upper bound of accuracy convergence, and $α$ determines the rate of convergence. Based on our theory, these parameters can be calculated and the predicted accuracy curve then can be obtained through only a single round of self-correction. Extensive experiments across diverse models and datasets demonstrate that our theoretical predictions align closely with empirical accuracy curves, validating the effectiveness of the theory. Our work provides a theoretical foundation for understanding LLM self-correction, thus paving the way for further explorations.
Problem

Research questions and friction points this paper is trying to address.

Modeling accuracy dynamics in LLM self-correction process
Explaining performance improvements through probabilistic scaling theory
Predicting multi-round accuracy curves from single-round data
Innovation

Methods, ideas, or system contributions that make the work stand out.

Probabilistic theory models accuracy dynamics
Predicts multi-round self-correction from single round
Mathematical derivation defines convergence rate parameters
🔎 Similar Papers
No similar papers found.
Z
Zhe Yang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Yichang Zhang
Yichang Zhang
Qwen Team, Alibaba Group
NLPReinforcement LearningDeep LearningMachine LearningArtificial Intelligence
Y
Yudong Wang
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University
Ziyao Xu
Ziyao Xu
Peking University
Junyang Lin
Junyang Lin
Qwen Team, Alibaba Group & Peking University
Natural Language ProcessingCross-Modal Representation LearningPretraining
Z
Zhifang Sui
State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University