Hierarchies of Calibration: Classification meets Regression

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This study addresses the unification of calibration concepts across classification and regression tasks, aiming to ensure consistency between predicted distributions and observed outcomes for diverse data types—continuous, discrete, nominal, and binary. The work introduces modal calibration for nominal outcomes and establishes a hierarchical framework distinguishing full, partial, and average calibration. It proposes a generalized definition of calibration based on predictive distribution functionals—such as means, quantiles, and event probabilities—and leverages probability integral transforms alongside constructive algorithms for analysis. Key contributions include demonstrating the logical independence between dual probability integral transform (PIT) calibration and existing discrete calibration notions, clarifying implication and independence relationships among various calibration types, and providing reproducible methods for generating illustrative examples and counterexamples.

📝 Abstract

Concepts of calibration formalize the compatibility between probabilistic predictions and the respective outcomes. In a nutshell, the outcomes ought to be indistinguishable from random draws from the predictive distributions. In this paper, we review, extend, and bridge notions of calibration that have been proposed for classification and regression tasks. Particular emphasis is given to hierarchical relations between the various notions, as they apply to general real-valued data, continuous outcomes, count data, nominal classes, and binary outcomes. To highlight a number of contributions, we introduce the notion of modal calibration for nominal outcomes, we distinguish full, partial, and average calibration in this setting, and we show that double probability integral transform (PIT) calibration is logically independent of previously proposed concepts of calibration for discrete outcomes. Furthermore, we generalize extant results on concepts of calibration that are expressed in terms of properties or functionals of the predictive distributions, such as means, quantiles, or event probabilities. Throughout the paper, we illustrate the concepts and their hierarchical relations in worked examples, and we provide algorithmic tools that support the construction of instructive examples and counterexamples.

Problem

Research questions and friction points this paper is trying to address.

calibration

classification

regression

hierarchical relations

probabilistic predictions

Innovation

Methods, ideas, or system contributions that make the work stand out.

modal calibration

hierarchical calibration

double PIT