Boosting Self-Consistency with Ranking

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the limitations of standard self-consistency methods, which rely on majority voting and often fail to effectively identify correct answers. The authors propose RISC, a novel framework that reformulates answer selection as a learning-to-rank task by integrating multiple signals for the first time in this context. RISC employs a lightweight LambdaRank model to score candidate answers based on five complementary features: answer frequency, semantic centrality, reasoning path consistency, and two additional signals. Evaluated across multiple question-answering benchmarks, RISC significantly outperforms both standard self-consistency and strong baseline approaches, achieving a superior trade-off between accuracy and computational efficiency. These results demonstrate the effectiveness of leveraging multidimensional signals in enhancing answer selection through structured ranking.

📝 Abstract

Self-consistency improves large language models by sampling multiple reasoning paths and selecting the most frequent answer, but majority voting often fails to recover correct answers that are already present among the samples. We address this limitation with Ranking-Improved Self-Consistency (RISC), which reformulates answer selection in self-consistency as a ranking problem. Instead of relying on a single uncertainty or confidence signal, RISC uses a lightweight LambdaRank model to score candidate answers with five carefully designed features that capture answer frequency, semantic centrality, and reasoning-trace consistency. We evaluate RISC on three datasets under a range of test-time budgets. Across datasets, RISC consistently achieves a better accuracy-efficiency trade-off than standard self-consistency and strong baselines, with particularly large gains on question answering benchmarks. Further analysis shows that the proposed features are individually useful and, more importantly, complementary, highlighting the value of learning to combine multiple informative signals for test-time answer selection.

Problem

Research questions and friction points this paper is trying to address.

self-consistency

answer selection

majority voting

large language models

reasoning paths

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistency

Answer Ranking

LambdaRank