Doubly Robust Machine Learning for Population Size Estimation with Missing Covariates: Application to Gaza Conflict Mortality

📅 2026-02-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of missing covariates—arising from sensitive nonresponse, data constraints, or incomplete records—in capture–recapture analyses, which severely biases population size estimates for hard-to-reach groups, such as conflict-related fatalities. Under the assumption of missingness at random, the authors develop a nonparametric identification framework and propose a one-step estimator that is doubly robust, asymptotically efficient, and performs well in finite samples. By integrating semiparametric efficiency theory with flexible machine learning algorithms, the method yields robust inference on capture probabilities even under high missingness rates. Simulations demonstrate its marked superiority over conventional imputation approaches. Applied to mortality data from the Gaza conflict, the estimator suggests that the true death toll is approximately 26% higher than official counts, providing a more accurate and conservative assessment.

Technology Category

Application Category

📝 Abstract
Population size estimation from capture-recapture data is central for studying hard-to-reach populations, incorporating auxiliary covariates to account for heterogeneous capture probabilities and recapture dependencies. However, missing attributes pose a critical methodological challenge due to reluctance to share sensitive information, data collection limitations, and imperfect record linkage. Existing approaches either ignore missingness or rely on a priori imputation, potentially introducing substantial bias. In this work, we develop a novel nonparametric estimation framework using a Missing at Random assumption to identify capture probabilities under missing covariates. Using semiparametric efficiency theory, we construct one-step estimators that combine efficiency, robustness, and finite-sample validity: they approximately achieve the nonparametric efficiency bound, accommodate flexible machine learning methods through a doubly robust structure, and provide approximately valid inference for any sample size. Simulations demonstrate substantial improvements over naive imputation approaches, with our doubly robust ML estimators maintaining valid inference even at high missingness rates where competing methods fail. We apply our methodology to re-estimate mortality in the Gaza Strip from October 7, 2023, to June 30, 2024, using three-list capture-recapture data with missing demographic information. Our approach yields more conservative yet precise estimates compared to previous methods, indicating the true death toll exceeds official statistics by approximately 26%. Our framework provides practitioners with principled tools for handling incomplete data in conflict settings and other applications with hard-to-reach populations.
Problem

Research questions and friction points this paper is trying to address.

population size estimation
missing covariates
capture-recapture
hard-to-reach populations
conflict mortality
Innovation

Methods, ideas, or system contributions that make the work stand out.

doubly robust
capture-recapture
missing covariates
machine learning
semiparametric efficiency
🔎 Similar Papers
No similar papers found.