CGES: Confidence-Guided Early Stopping for Efficient and Accurate Self-Consistency

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

To address the inefficiency of fixed-sampling strategies in large language model (LLM) self-consistency reasoning—and the resulting accuracy degradation under sparse correct-answer conditions—this paper proposes a confidence-guided Bayesian adaptive early-stopping mechanism. Our method introduces the first provably convergent dynamic stopping criterion: it leverages token-level probabilities or scalar confidence scores from reward models to sequentially update the posterior distribution over candidate answers, terminating sampling once the posterior quality exceeds a predefined threshold. Evaluated on five mainstream reasoning benchmarks, our approach reduces average LLM call counts by 69% (e.g., from 16.0 to 4.9), with negligible accuracy loss (<0.06 percentage points). This yields substantial gains in inference efficiency and robustness, particularly in low-confidence regimes.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) are often queried multiple times at test time, with predictions aggregated by majority vote. While effective, this self-consistency strategy (arXiv:2203.11171) requires a fixed number of calls and can fail when the correct answer is rare. We introduce Confidence-Guided Early Stopping (CGES), a Bayesian framework that forms posteriors over candidate answers using scalar confidence signals derived from token probabilities or reward models. CGES adaptively halts sampling once the posterior mass of a candidate exceeds a threshold. We provide theoretical guarantees for both perfectly calibrated confidences and realistic noisy confidence signals. Across five reasoning benchmarks, CGES reduces the average number of model calls by about 69 percent (for example, from 16.0 to 4.9) while matching the accuracy of self-consistency within 0.06 percentage points.

Problem

Research questions and friction points this paper is trying to address.

Reducing model calls in self-consistency sampling

Halting sampling adaptively using confidence-guided posteriors

Maintaining accuracy while improving efficiency in reasoning tasks

Innovation

Methods, ideas, or system contributions that make the work stand out.

CGES uses Bayesian posteriors from confidence signals

Adaptively stops sampling when posterior exceeds threshold

Reduces model calls by 69% while maintaining accuracy

🔎 Similar Papers

No similar papers found.