🤖 AI Summary
Football offensive statistics are confounded by match context—such as goal difference, red cards, home/away status, and pre-match win probability—leading to biased performance assessments. To address this, we develop a generalized additive model (GAM) with count-valued responses, trained on minute-level event data from 15 seasons across Europe’s top five leagues. The model systematically incorporates nonlinear effects and interactions among contextual covariates, enabling context-aware calibration of offensive metrics (e.g., shots, corners). We propose a novel “context-standardized offensive performance adjustment” framework that maps raw statistics onto a common baseline, thereby enhancing fairness and cross-match/cross-team comparability. This approach effectively disentangles outcome-driven bias and delivers an interpretable, reproducible statistical framework for objective offensive performance evaluation.
📝 Abstract
In soccer, game context can result in skewing offensive statistics in ways that might misrepresent how well a team has played. For instance, in England's 1-2 loss to France in the 2022 FIFA World Cup quarterfinal, England attempted considerably more shots (16 to France's 8) and more corners (5 to 2), potentially suggesting they played better despite the loss. However, these statistics were largely accumulated when France was ahead and more willing to concede offensive initiative to England. To explore how game context influences offensive performance, we analyze minute-by-minute event-sequenced match data from 15 seasons across five major European leagues. Using count-response Generalized Additive Modeling, we consider features such as score and red card differential, home/away status, pre-match win probabilities, and game minute. Moreover, we leverage interaction terms to test several intuitive hypotheses about how these features might cooperate in explaining offensive production. The selected model is then applied to project offensive statistics onto a standardized "common denominator" scenario: a tied home game with even men on both sides. The adjusted numbers - in contrast to regular game totals that disregard game context - offer a more contextualized comparison, reducing the likelihood of misrepresenting the relative quality of play.