Debiasing Machine Learning Predictions for Causal Inference Without Additional Ground Truth Data: "One Map, Many Trials" in Satellite-Driven Poverty Analysis

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Machine learning models trained on satellite imagery to predict household wealth suffer from mean-shrinkage bias due to standard regression objectives, attenuating estimated causal treatment effects and limiting policy evaluation. Existing debiasing methods require additional ground-truth labels—often unavailable in data-scarce settings. This paper proposes two label-free calibration methods: Linear Calibration and Tweedie Calibration—the first to correct attenuation without modifying model training or conducting new field surveys. Leveraging holdout-based calibration, empirical Bayes estimation, and model score analysis, our approach is validated on Demographic and Health Surveys (DHS) data. Experiments show that our methods match or surpass baseline approaches requiring model retraining or auxiliary labels in causal effect estimation accuracy. They significantly enhance the reliability of wealth-map-driven iterative policy evaluations under label scarcity, enabling a “one-map, multiple-experiments” paradigm for causal inference.

Technology Category

Application Category

📝 Abstract
Machine learning models trained on Earth observation data, such as satellite imagery, have demonstrated significant promise in predicting household-level wealth indices, enabling the creation of high-resolution wealth maps that can be leveraged across multiple causal trials. However, because standard training objectives prioritize overall predictive accuracy, these predictions inherently suffer from shrinkage toward the mean, leading to attenuated estimates of causal treatment effects and limiting their utility in policy. Existing debiasing methods, such as Prediction-Powered Inference, can handle this attenuation bias but require additional fresh ground-truth data at the downstream stage of causal inference, which restricts their applicability in data-scarce environments. Here, we introduce and evaluate two correction methods -- linear calibration correction and Tweedie's correction -- that substantially reduce prediction bias without relying on newly collected labeled data. Linear calibration corrects bias through a straightforward linear transformation derived from held-out calibration data, whereas Tweedie's correction leverages empirical Bayes principles to directly address shrinkage-induced biases by exploiting score functions derived from the model's learning patterns. Through analytical exercises and experiments using Demographic and Health Survey data, we demonstrate that the proposed methods meet or outperform existing approaches that either require (a) adjustments to training pipelines or (b) additional labeled data. These approaches may represent a promising avenue for improving the reliability of causal inference when direct outcome measures are limited or unavailable, enabling a "one map, many trials" paradigm where a single upstream data creation team produces predictions usable by many downstream teams across diverse ML pipelines.
Problem

Research questions and friction points this paper is trying to address.

Reduce shrinkage bias in ML poverty predictions
Debias causal estimates without new ground truth data
Improve reliability of satellite-based wealth analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear calibration corrects bias without new data
Tweedie's correction uses empirical Bayes principles
Debiasing methods improve causal inference reliability
🔎 Similar Papers
No similar papers found.