🤖 AI Summary
This paper addresses the problem of reliably ranking causal effects when individual-level causal effects cannot be directly estimated, leveraging prediction model outputs (e.g., conversion probabilities). We propose three formal theoretical conditions—full latent confounding, full latent mediation, and latent monotonicity—that jointly characterize the necessary and sufficient boundaries under which prediction scores can serve as valid proxies for latent effect modifiers to recover causal rankings. We prove that, when satisfied, score-based ranking achieves high-fidelity recovery of causal heterogeneity—even outperforming direct causal estimation. Our method integrates the potential outcomes framework with moderator/mediator analysis and monotonicity testing, and provides practical guidelines for feasibility assessment without requiring new intervention data or immediate feedback. By relaxing reliance on large-scale randomized experiments and real-time outcome data, our approach is broadly applicable to canonical intervention settings such as online advertising and user retention.
📝 Abstract
Predictive models that estimate outcome probabilities are widely used to guide interventions in applications such as advertising, customer retention, and behavioral nudging. Although these outcome probabilities do not measure causal effects, they are often treated as proxies for identifying individuals with the highest intervention impact. We investigate when and why these predictions (which we refer to as scores) can reliably rank individuals by their causal effects in settings where direct effect estimation is infeasible. The key mechanism underlying this approach is that scores serve as proxies for a latent moderator that drives variation in causal effects. Building on this foundation, we introduce three key conditions -- full latent moderation, full latent mediation, and latent monotonicity -- that determine when scores can recover causal-effect rankings and, in some cases, even outperform direct effect estimation. To support practical applications, we provide guidelines for assessing when scores are viable proxies, particularly in contexts lacking data on new interventions or with delayed outcome measurements. Our findings demonstrate that effect heterogeneity can be leveraged through predictive modeling when the target variable being modeled captures a strong latent moderator, expanding the scope of causal inference beyond traditional effect estimation and, in some cases, reducing the need for large-scale randomized experiments.