Bradley-Terry Rankings for Recommender Systems Across Dataset Taxonomies

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge of fairly evaluating recommendation algorithms across diverse datasets, where performance is influenced by factors such as sparsity, sequential structure, and scale. Conventional evaluation methods relying on average metrics often distort algorithm rankings and hinder equitable comparison. To overcome this, the authors propose a data-driven ranking framework based on the Bradley–Terry (BT) model, enhanced with BT trees and covariate-augmented BT models that incorporate dataset-specific statistical features. This approach yields more consistent and robust evaluation metrics, accurately capturing how dataset characteristics affect algorithmic performance. Notably, it maintains ranking stability even under partial data missingness and enables prediction of an algorithm’s relative performance on unseen datasets without requiring actual execution.

📝 Abstract

The ranking of recommendation algorithms is a challenging problem since model performance is sensitive to dataset characteristics such as sparsity, sequential structure, and scale. This drives a demand for a proper methodology for fair comparison between algorithms. Naive aggregation of performance metrics (e.g., averaging NDCG over benchmarks) can yield misleading rankings, undermining practical selection. To address this problem, we introduce a novel, data-driven ranking methodology based on Bradley-Terry (BT) model. We demonstrate that the obtained ranking depends on key dataset statistics. Additionally, we propose a novel metric for evaluating ranking consistency and demonstrate robustness of our ranking to incomplete data. Finally, we introduce a dataset-specific methodology for ranking algorithms on unseen datasets without running the models, relying on extensions of the Bradley-Terry framework, including BT trees and BT models with covariates.

Problem

Research questions and friction points this paper is trying to address.

Bradley-Terry model

recommender systems

algorithm ranking

dataset characteristics

fair comparison

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bradley-Terry model

recommender systems

dataset taxonomy