SEMF: Supervised Expectation-Maximization Framework for Predicting Intervals

📅 2024-05-28

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the limitations of existing supervised uncertainty estimation methods—namely, overly wide prediction intervals, reliance on quantile loss, and poor generalization. We propose SEMF, a model-agnostic framework that introduces the Expectation-Maximization (EM) algorithm to supervised uncertainty quantification for the first time. SEMF employs latent-variable modeling to calibrate prediction intervals for arbitrary black-box models (e.g., tree-based models and neural networks), without requiring quantile loss or model retraining. It integrates supervised EM, latent-variable inference, and conformalized quantile regression ensembling. Evaluated on 11 real-world tabular datasets, SEMF significantly narrows prediction interval width (average reduction of 12.7%) while strictly guaranteeing target coverage (e.g., 90%). It consistently outperforms conventional quantile regression and conformal prediction baselines. The key contribution lies in successfully adapting the unsupervised EM paradigm to supervised interval forecasting, thereby enhancing both accuracy and robustness of uncertainty quantification.

Technology Category

Application Category

📝 Abstract

This work introduces the Supervised Expectation-Maximization Framework (SEMF), a versatile and model-agnostic approach for generating prediction intervals with any ML model. SEMF extends the Expectation-Maximization algorithm, traditionally used in unsupervised learning, to a supervised context, leveraging latent variable modeling for uncertainty estimation. Through extensive empirical evaluation of diverse simulated distributions and 11 real-world tabular datasets, SEMF consistently produces narrower prediction intervals while maintaining the desired coverage probability, outperforming traditional quantile regression methods. Furthermore, without using the quantile (pinball) loss, SEMF allows point predictors, including gradient-boosted trees and neural networks, to be calibrated with conformal quantile regression. The results indicate that SEMF enhances uncertainty quantification under diverse data distributions and is particularly effective for models that otherwise struggle with inherent uncertainty representation.

Problem

Research questions and friction points this paper is trying to address.

Predicting intervals with ML models using SEMF

Extending EM algorithm for supervised uncertainty estimation

Improving uncertainty quantification across diverse data distributions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends EM algorithm to supervised learning

Leverages latent variables for uncertainty estimation

Calibrates point predictors without quantile loss

🔎 Similar Papers

No similar papers found.

Authors to Follow