🤖 AI Summary
Existing probabilistic forecasting evaluation methods lack the ability to characterize tail calibration—critical for high-impact extreme events, whose reliability is increasingly vital for risk-informed decision-making.
Method: This paper introduces, for the first time, a general definition of tail calibration, rigorously connecting it to classical probabilistic calibration theory and integrating the Peaks-over-Threshold (POT) framework from extreme value theory. We develop an operational diagnostic framework by unifying probabilistic calibration theory, extreme-value statistics, diagnostic statistical tests, and empirical analysis.
Contribution/Results: Applied to European precipitation forecasts, our framework significantly improves the quantification of predictive credibility for high-impact, rare events. It enables rigorous assessment of tail behavior in probabilistic forecasts and establishes a novel paradigm for extreme-event risk assessment and decision support.
📝 Abstract
Probabilistic forecasts comprehensively describe the uncertainty in the unknown future outcome, making them essential for decision making and risk management. While several methods have been introduced to evaluate probabilistic forecasts, existing evaluation techniques are ill-suited to the evaluation of tail properties of such forecasts. However, these tail properties are often of particular interest to forecast users due to the severe impacts caused by extreme outcomes. In this work, we introduce a general notion of tail calibration for probabilistic forecasts, which allows forecasters to assess the reliability of their predictions for extreme outcomes. We study the relationships between tail calibration and standard notions of forecast calibration, and discuss connections to peaks-over-threshold models in extreme value theory. Diagnostic tools are introduced and applied in a case study on European precipitation forecasts