🤖 AI Summary
This paper addresses the unreliable tail extrapolation in semiparametric density regression—particularly for heavy-tailed data or predictions beyond the observed range—where existing Semiparametric Quantile Regression (SPQR) methods lack asymptotic guarantees from Extreme Value Theory (EVT). To resolve this, we propose xSPQR: a semiparametric method integrating spline-based neural networks with a novel smooth mixture Generalized Pareto (GP) distribution. Its core innovation is the first threshold-free, continuously parameterized mixture GP model, enabling theoretical unification of SPQR and EVT while supporting decoupled, interpretable joint modeling of the bulk and tail regions. xSPQR thus ensures statistical reliability without sacrificing modeling flexibility. In simulations and real-world forecasting of U.S. wildfire burn areas, xSPQR significantly improves tail extrapolation accuracy while preserving bulk-fitting performance.
📝 Abstract
Semi-parametric quantile regression (SPQR) is a flexible approach to density regression that learns a spline-based representation of conditional density functions using neural networks. As it makes no parametric assumptions about the underlying density, SPQR performs well for in-sample testing and interpolation. However, it can perform poorly when modelling heavy-tailed data or when asked to extrapolate beyond the range of observations, as it fails to satisfy any of the asymptotic guarantees provided by extreme value theory (EVT). To build semi-parametric density regression models that can be used for reliable tail extrapolation, we create the blended generalised Pareto (GP) distribution, which i) provides a model for the entire range of data and, via a smooth and continuous transition, ii) benefits from exact GP upper-tails without the need for intermediate threshold selection. We combine SPQR with our blended GP to create extremal semi-parametric quantile regression (xSPQR), which provides a flexible semi-parametric approach to density regression that is compliant with traditional EVT. We handle interpretability of xSPQR through the use of model-agnostic variable importance scores, which provide the relative importance of a covariate for separately determining the bulk and tail of the conditional density. The efficacy of xSPQR is illustrated on simulated data, and an application to U.S. wildfire burnt areas.