EviRank: Evidence-Based Confidence Estimation for LLM-Based Ranking

📅 2026-06-03

📈 Citations: 0

✨ Influential: 0

career value

174K/year

🤖 AI Summary

This work addresses the limitation of existing large language models in providing fine-grained, position-aware assessments of result reliability for ranking tasks. To this end, the authors propose an evidence-based confidence estimation method that extracts three complementary forms of evidence through a single forward pass. By integrating reliable opinion aggregation with a novel position-aware calibration mechanism—introduced here for the first time—the approach enables fine-grained uncertainty quantification over ranked outputs. Evaluated on three benchmark datasets, the method simultaneously achieves state-of-the-art performance in both recommendation accuracy and uncertainty estimation, substantially enhancing the trustworthiness of ranking systems.

📝 Abstract

Large Language Models show promise for recommendation, but they raise reliability concerns due to limited domain coverage and inherent stochasticity. Existing uncertainty quantification methods persist two fundamental challenges: (1) the global confidence score designed for question answering fails to reveal which positions are unreliable in ranking list; (2) fine-grained confidence extracted from model internals exhibits uniformly low values across all positions, making it impossible to filter unreliable predictions. To tackle the challenges, we propose an evidence-based confidence estimation for LLM-based ranking (EviRank). We extract three complementary evidences from a single forward pass and aggregate them via reliable opinion aggregation. Furthermore, we recognize that ranking positions are inherently unequal, and introduce a position-aware calibration. Lastly, the calibrated confidence guides ranking optimization. Experiments on three datasets demonstrate that our method achieves state-of-the-art performance on both recommendation and uncertainty quantification.

Problem

Research questions and friction points this paper is trying to address.

confidence estimation

LLM-based ranking

uncertainty quantification

reliability

ranking positions

Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence estimation

LLM-based ranking

evidence aggregation