EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

📅 2026-06-03
📈 Citations: 0
Influential: 0
📄 PDF

career value

174K/year
🤖 AI Summary
This work addresses the limitation of existing large language models in providing fine-grained, position-aware assessments of result reliability for ranking tasks. To this end, the authors propose an evidence-based confidence estimation method that extracts three complementary forms of evidence through a single forward pass. By integrating reliable opinion aggregation with a novel position-aware calibration mechanism—introduced here for the first time—the approach enables fine-grained uncertainty quantification over ranked outputs. Evaluated on three benchmark datasets, the method simultaneously achieves state-of-the-art performance in both recommendation accuracy and uncertainty estimation, substantially enhancing the trustworthiness of ranking systems.
📝 Abstract
Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.
Problem

Research questions and friction points this paper is trying to address.

confidence estimation
LLM-based ranking
uncertainty quantification
reliability
ranking positions
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence estimation
LLM-based ranking
evidence aggregation
position-aware calibration
uncertainty quantification
🔎 Similar Papers