🤖 AI Summary
This paper addresses the challenge of regression prediction under scarce target-domain labels (e.g., clinical few-shot settings) by proposing a novel method to construct reliable conformal prediction sets leveraging abundant, albeit distributionally shifted, labeled source-domain data. Methodologically, it (1) aligns covariate distributions via B-spline-based representation learning; (2) models cross-domain relationships using likelihood-ratio estimation and reweights the source conditional density to approximate the target distribution; and (3) integrates quantile processes with highest-density region construction to robustly handle non-exchangeability, asymmetric errors, and multimodal residuals. Theoretically, the approach guarantees exact marginal coverage under mild assumptions. Empirically, on real-world datasets including MIMIC-III, it achieves significantly tighter prediction intervals than state-of-the-art baselines while maintaining nominal coverage—even with as few as tens of target samples—demonstrating strong practical utility in low-label regimes.
📝 Abstract
In real-world applications, the limited availability of labeled outcomes presents significant challenges for statistical inference due to high collection costs, technical barriers, and other constraints. In this work, we propose a method to construct efficient conformal prediction sets for new target outcomes by leveraging a source distribution that is distinct from the target but related through a distributional shift assumption and provides abundant labeled data. When the target data are fully unlabeled, our predictions rely solely on the source distribution, whereas partial target labels, when available, are integrated to improve efficiency. To address the challenges of data non-exchangeability and distribution non-identifiability, we identify the likelihood ratio by matching the covariate distributions of the source and target domains within a finite B-spline space. To accommodate complex error structures such as asymmetry and multimodality, our method constructs highest predictive density sets using a novel weight-adjusted conditional density estimator. This estimator models the source conditional density along a quantile process and transforms it, through appropriate weighting adjustments, to approximate the target conditional density. We establish the theoretical properties of the proposed method and evaluate its finite-sample performance through simulation studies and a real-data application to the MIMIC-III clinical database.