Ranking Items from Discrete Ratings: The Cost of Unknown User Thresholds

📅 2025-10-02

📈 Citations: 0

✨ Influential: 0

career value

224K/year

🤖 AI Summary

This paper addresses the problem of recovering fine-grained item rankings from discrete, coarse-grained user ratings (e.g., 1–5 stars) under unknown, user-specific rating thresholds. We propose a probabilistic ordinal query model that jointly models item scores and personalized user thresholds, with ranking quality measured by Spearman distance. Our key contributions are threefold: First, we establish the necessity—and inherent cost—of modeling inter-user threshold heterogeneity for accurate ranking recovery. Second, we quantify the impact of mismatch between rating and threshold distributions via a quadratic divergence factor. Third, we derive tight Θ(n²) lower bounds on both the required number of users and query complexity to achieve ε-optimal ranking—significantly higher than the O(n log n) bound for pairwise comparison models. Finally, we design an algorithm matching this lower bound up to logarithmic factors.

Technology Category

Application Category

📝 Abstract

Ranking items is a central task in many information retrieval and recommender systems. User input for the ranking task often comes in the form of ratings on a coarse discrete scale. We ask whether it is possible to recover a fine-grained item ranking from such coarse-grained ratings. We model items as having scores and users as having thresholds; a user rates an item positively if the item's score exceeds the user's threshold. Although all users agree on the total item order, estimating that order is challenging when both the scores and the thresholds are latent. Under our model, any ranking method naturally partitions the $n$ items into bins; the bins are ordered, but the items inside each bin are still unordered. Users arrive sequentially, and every new user can be queried to refine the current ranking. We prove that achieving a near-perfect ranking, measured by Spearman distance, requires $Θ(n^2)$ users (and therefore $Ω(n^2)$ queries). This is significantly worse than the $O(nlog n)$ queries needed to rank from comparisons; the gap reflects the additional queries needed to identify the users who have the appropriate thresholds. Our bound also quantifies the impact of a mismatch between score and threshold distributions via a quadratic divergence factor. To show the tightness of our results, we provide a ranking algorithm whose query complexity matches our bound up to a logarithmic factor. Our work reveals a tension in online ranking: diversity in thresholds is necessary to merge coarse ratings from many users into a fine-grained ranking, but this diversity has a cost if the thresholds are a priori unknown.

Problem

Research questions and friction points this paper is trying to address.

Recovering fine-grained item rankings from coarse-grained discrete user ratings

Addressing the challenge of unknown user thresholds in rating aggregation

Quantifying query complexity for achieving near-perfect ranking accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Modeling users with latent thresholds for rating items

Proving quadratic user requirement for fine-grained ranking

Developing algorithm with matching query complexity bound

🔎 Similar Papers

Review-based Recommender Systems: A Survey of Approaches, Challenges and Future Perspectives