🤖 AI Summary
This study addresses the susceptibility of parametric models to misspecification in high-dimensional paired comparison data by proposing a semiparametric modeling framework that incorporates an unspecified latent distribution to capture item merits and covariate effects. The method employs kernel-based least squares estimation and, for the first time under a diverging dimensionality setting, enables semiparametric analysis of paired comparisons with covariates, balancing model flexibility with statistical inferential validity. Theoretical results establish the consistency and asymptotic normality of the proposed estimator. Extensive simulations and an empirical analysis of NBA data demonstrate the effectiveness and practical utility of the approach.
📝 Abstract
Statistical inference in parametric models (e.g., the Bradley--Terry model and its variants) for paired-comparison data has been explored in the high-dimensional regime, in which the number of items involving in paired comparisons diverges. However, parametric models are highly susceptible to model misspecification. To relax the assumption of known distributions and provide flexibility, we propose a semiparametric framework for modeling the merits of items and covariate effects (e.g., home-field advantage) by introducing latent random variables with unspecified distributions. As the number of parameters increases with the number of items, semiparametric inference is highly nontrivial. To address this issue, we employ a kernel-based least squares approach to estimate all unknown parameters. When each pair of items has a fixed number of comparisons and the number of items tends to infinity, we prove the consistency of all resulting estimators and derive their asymptotic normal distributions. To the best of our knowledge, this is the first study to conduct a semiparametric analysis of paired comparisons with an increasing dimension. We conduct simulations to evaluate the finite-sample performance of the proposed method and illustrate its practical utility by analyzing an NBA dataset.