🤖 AI Summary
This study addresses the challenge of generating fine-grained subnational inferences in humanitarian contexts, where sparse survey data often prove insufficient. The authors propose a context-conditional normalizing flow generative model that integrates multisource geospatial and socioeconomic covariates as external context to learn full conditional distributions—rather than point estimates—of population characteristics. By leveraging rich contextual information, the model effectively enhances local population distribution estimates even under extreme data scarcity. Experiments across eight household survey datasets from six low- and middle-income countries demonstrate that the approach substantially improves subnational estimation accuracy, with performance systematically increasing as the richness of contextual information grows.
📝 Abstract
Data scarcity limits inference in many scientific and policy domains. Survey data are essential for decision-making, but sparse samples often fail to capture fine spatial granularities. We evaluate normalizing flows, a generative model that learns complex data distributions and can be conditioned on exogenous contextual features, in controlled data scarcity scenarios. Across eight household survey datasets spanning six low-income or middle-income countries in the humanitarian domain, we show that context-conditioned generative models can refine sub-national survey distributions under severe data scarcity, and that performance increases systematically with the richness of the conditioning information. These findings support a general principle for survey data augmentation: generative models can improve sub-national estimates when the sparse sample retains sufficient support and contextual covariates encode relevant local heterogeneity. By learning full conditional distributions rather than point estimates, the approach provides fine-grained evidence for humanitarian decision-making and resource allocation.