🤖 AI Summary
This study addresses the challenge of cross-wave missing data in large-scale complex surveys, where certain variables are measured only in specific years. To enable effective prediction of unobserved outcomes for the current population, the authors propose a weighted conformal prediction framework that jointly estimates density ratios and subgroup proportions to approximate the likelihood ratio between historical samples and the target population. This approach corrects for temporal shifts in covariate distributions while preserving representativeness under complex sampling designs. Both theoretical analysis and empirical evaluations demonstrate that the method achieves valid prediction sets with coverage close to the nominal level and substantially improves prediction efficiency compared to existing approaches, as evidenced in simulation studies and real-world prediction of low-density lipoprotein cholesterol (LDL-C) levels in the U.S. population.
📝 Abstract
In large-scale complex surveys such as the National Health and Nutrition Examination Survey (NHANES), some outcomes are measured only in selected years, leaving incomplete records across survey waves. We develop a weighted conformal prediction framework that enables valid population-level prediction of unobserved outcomes using information from earlier surveys. The method accommodates covariate shift, where both continuous and categorical covariate distributions evolve over time while survey design affects representativeness. It integrates subgroup-specific density ratio and subgroup-proportion estimation to approximate likelihood ratios between the historical and target covariate distributions, and we establish coverage guarantees for the resulting prediction sets. Simulation studies and an application predicting low-density lipoprotein cholesterol (LDL-C) for the current U.S. population show that the proposed approach achieves coverage close to the nominal level and improved efficiency over existing methods, particularly when covariate distributions are complex or unknown.