SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals

๐Ÿ“… 2024-05-28
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the limitations of existing supervised uncertainty estimation methodsโ€”namely, overly wide prediction intervals, reliance on quantile loss, and poor generalization. We propose SEMF, a model-agnostic framework that introduces the Expectation-Maximization (EM) algorithm to supervised uncertainty quantification for the first time. SEMF employs latent-variable modeling to calibrate prediction intervals for arbitrary black-box models (e.g., tree-based models and neural networks), without requiring quantile loss or model retraining. It integrates supervised EM, latent-variable inference, and conformalized quantile regression ensembling. Evaluated on 11 real-world tabular datasets, SEMF significantly narrows prediction interval width (average reduction of 12.7%) while strictly guaranteeing target coverage (e.g., 90%). It consistently outperforms conventional quantile regression and conformal prediction baselines. The key contribution lies in successfully adapting the unsupervised EM paradigm to supervised interval forecasting, thereby enhancing both accuracy and robustness of uncertainty quantification.

Technology Category

Application Category

๐Ÿ“ Abstract
This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic approach for generating prediction intervals with any ML model. SEMF extends the Expectation-Maximization algorithm, traditionally used in unsupervised learning, to a supervised context, leveraging latent variable modeling for uncertainty estimation. Through extensive empirical evaluation of diverse simulated distributions and 11 real-world tabular datasets, SEMF consistently produces narrower prediction intervals while maintaining the desired coverage probability, outperforming traditional quantile regression methods. Furthermore, without using the quantile (pinball) loss, SEMF allows point predictors, including gradient-boosted trees and neural networks, to be calibrated with conformal quantile regression. The results indicate that SEMF enhances uncertainty quantification under diverse data distributions and is particularly effective for models that otherwise struggle with inherent uncertainty representation.
Problem

Research questions and friction points this paper is trying to address.

Predicting intervals with ML models using SEMF
Extending EM algorithm for supervised uncertainty estimation
Improving uncertainty quantification across diverse data distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends EM algorithm to supervised learning
Leverages latent variables for uncertainty estimation
Calibrates point predictors without quantile loss
๐Ÿ”Ž Similar Papers
No similar papers found.
Ilia Azizi
Ilia Azizi
University of Lausanne
machine learningstatistics
M
M. Boldi
HEC, University of Lausanne
V
V. Chavez-Demoulin
HEC, University of Lausanne, Expertise Center for Climate Extremes (ECCE)