🤖 AI Summary
Existing sports performance evaluation methods rely on machine learning models to estimate “expected outcomes,” then compare them against actual performance. However, flexible ML models often suffer from small-sample bias and slow convergence, undermining reliable statistical inference.
Method: We propose an interpretable, frequency-based statistically valid framework for individual treatment effect estimation. It formally links conventional performance metrics—such as Goals Above Expected (GAX) and save percentage—to the Rao score test. By integrating residualized regression with a double machine learning architecture, the framework ensures semiparametric consistency, directly anchoring each metric to an interpretable individual-level causal effect.
Results: The method is empirically validated across diverse domains—including soccer, basketball, American football, and injury risk prediction—demonstrating substantial improvements in both statistical robustness and interpretability of performance assessments.
📝 Abstract
A popular quantitative approach to evaluating player performance in sports involves comparing an observed outcome to the expected outcome ignoring player involvement, which is estimated using statistical or machine learning methods. In soccer, for instance, goals above expectation (GAX) of a player measure how often shots of this player led to a goal compared to the model-derived expected outcome of the shots. Typically, sports data analysts rely on flexible machine learning models, which are capable of handling complex nonlinear effects and feature interactions, but fail to provide valid statistical inference due to finite-sample bias and slow convergence rates. In this paper, we close this gap by presenting a framework for player evaluation with metrics derived from differences in actual and expected outcomes using flexible machine learning algorithms, which nonetheless allows for valid frequentist inference. We first show that the commonly used metrics are directly related to Rao's score test in parametric regression models for the expected outcome. Motivated by this finding and recent developments in double machine learning, we then propose the use of residualized versions of the original metrics. For GAX, the residualization step corresponds to an additional regression predicting whether a given player would take the shot under the circumstances described by the features. We further relate metrics in the proposed framework to player-specific effect estimates in interpretable semiparametric regression models, allowing us to infer directional effects, e.g., to determine players that have a positive impact on the outcome. Our primary use case are GAX in soccer. We further apply our framework to evaluate goal-stopping ability of goalkeepers, shooting skill in basketball, quarterback passing skill in American football, and injury-proneness of soccer players.