Multiply Robust Conformal Risk Control with Coarsened Data

📅 2025-08-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge that conventional conformal prediction fails to guarantee distribution-free calibration when training data exhibit missing outcomes, missing covariates, or both—potentially under right-censoring. We propose a multiply robust conformal risk control framework grounded in semiparametric statistical theory. Our method constructs doubly robust residual estimates using efficient influence functions and integrates flexible machine learning models (e.g., random forests) to estimate latent variables. Under covariate shift and monotone missingness mechanisms, it achieves asymptotically exact coverage control. Simulation studies demonstrate that the proposed approach substantially improves both the stability of prediction interval coverage and the efficiency of interval width—particularly in high-dimensional settings and under complex, non-ignorable missing-data patterns. By unifying robust inference with conformal prediction, our framework establishes a new paradigm for reliable uncertainty quantification in the presence of incomplete data.

Technology Category

Application Category

📝 Abstract
Conformal Prediction (CP) has recently received a tremendous amount of interest, leading to a wide range of new theoretical and methodological results for predictive inference with formal theoretical guarantees. However, the vast majority of CP methods assume that all units in the training data have fully observed data on both the outcome and covariates of primary interest, an assumption that rarely holds in practice. In reality, training data are often missing the outcome, a subset of covariates, or both on some units. In addition, time-to-event outcomes in the training set may be censored due to dropout or administrative end-of-follow-up. Accurately accounting for such coarsened data in the training sample while fulfilling the primary objective of well-calibrated conformal predictive inference, requires robustness and efficiency considerations. In this paper, we consider the general problem of obtaining distribution-free valid prediction regions for an outcome given coarsened training data. Leveraging modern semiparametric theory, we achieve our goal by deriving the efficient influence function of the quantile of the outcome we aim to predict, under a given semiparametric model for the coarsened data, carefully combined with a novel conformal risk control procedure. Our principled use of semiparametric theory has the key advantage of facilitating flexible machine learning methods such as random forests to learn the underlying nuisance functions of the semiparametric model. A straightforward application of the proposed general framework produces prediction intervals with stronger coverage properties under covariate shift, as well as the construction of multiply robust prediction sets in monotone missingness scenarios. We further illustrate the performance of our methods through various simulation studies.
Problem

Research questions and friction points this paper is trying to address.

Develops robust conformal prediction for coarsened training data
Addresses missing outcomes and censored time-to-event data
Creates distribution-free valid prediction regions with coverage guarantees
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multiply robust conformal risk control
Efficient influence function for quantile prediction
Semiparametric theory with machine learning integration
🔎 Similar Papers
No similar papers found.