Spatially Robust Inference with Predicted and Missing at Random Labels

📅 2026-03-11

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This study addresses the challenge of accurate uncertainty quantification for statistical inference based on machine learning predictions when spatial dependence and missing-at-random (MAR) labels coexist. The authors propose a doubly robust estimator integrated with cross-fitting and introduce a novel jackknife-based heteroskedasticity and autocorrelation consistent (HAC) variance correction method that explicitly disentangles the spatial dependence structure from the fold-wise correlations induced by cross-fitting. This approach is the first to yield asymptotically valid confidence intervals for spatial data under MAR labeling. Empirical evaluations on both simulated and real-world datasets demonstrate markedly improved calibration of finite-sample confidence intervals, with particularly robust performance under clustered sampling designs and high missingness rates.

Technology Category

Application Category

📝 Abstract

When outcome data are expensive or onerous to collect, scientists increasingly substitute predictions from machine learning and AI models for unlabeled cases, a process which has consequences for downstream statistical inference. While recent methods provide valid uncertainty quantification under independent sampling, real-world applications involve missing at random (MAR) labeling and spatial dependence. For inference in this setting, we propose a doubly robust estimator with cross-fit nuisances. We show that cross-fitting induces fold-level correlation that distorts spatial variance estimators, producing unstable or overly conservative confidence intervals. To address this, we propose a jackknife spatial heteroscedasticity and autocorrelation consistent (HAC) variance correction that separates spatial dependence from fold-induced noise. Under standard identification and dependence conditions, the resulting intervals are asymptotically valid. Simulations and benchmark datasets show substantial improvement in finite-sample calibration, particularly under MAR labeling and clustered sampling.

Problem

Research questions and friction points this paper is trying to address.

missing at random

spatial dependence

statistical inference

machine learning predictions

uncertainty quantification

Innovation

Methods, ideas, or system contributions that make the work stand out.

spatial inference

missing at random

cross-fitting

doubly robust estimator

jackknife HAC

🔎 Similar Papers

Selective Prediction for Semantic Segmentation using Post-Hoc Confidence Estimation and Its Performance under Distribution Shift

2024-02-16arXiv.orgCitations: 0

Noisy Annotations in Semantic Segmentation

2024-06-16Citations: 3

Authors to Follow