Boosting Self-Consistency with Ranking

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

196K/year
🤖 AI Summary
This work addresses the limitations of standard self-consistency methods, which rely on majority voting and often fail to effectively identify correct answers. The authors propose RISC, a novel framework that reformulates answer selection as a learning-to-rank task by integrating multiple signals for the first time in this context. RISC employs a lightweight LambdaRank model to score candidate answers based on five complementary features: answer frequency, semantic centrality, reasoning path consistency, and two additional signals. Evaluated across multiple question-answering benchmarks, RISC significantly outperforms both standard self-consistency and strong baseline approaches, achieving a superior trade-off between accuracy and computational efficiency. These results demonstrate the effectiveness of leveraging multidimensional signals in enhancing answer selection through structured ranking.
📝 Abstract
Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We address this limitation with Ranking-Improved Self-Consistency (RISC), which reformulates answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, RISC uses a lightweight LambdaRank model to score candidate answers with five carefully designed features that capture answer frequency, semantic centrality, and reasoning-trace consistency. We evaluate RISC on three datasets under a range of test-time budgets. Across datasets, RISC consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.
Problem

Research questions and friction points this paper is trying to address.

self-consistency
answer selection
majority voting
large language models
reasoning paths
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistency
Answer Ranking
LambdaRank
Reasoning Paths
Test-Time Computation
🔎 Similar Papers
No similar papers found.