🤖 AI Summary
Existing heterogeneous treatment effect (HTE) estimators lack robust, comparable evaluation criteria.
Method: We propose the first relative-error-based robust evaluation framework for HTE estimation, overcoming the limitations of conventional absolute-error metrics. Theoretically, we establish the identifiability conditions for nuisance parameters and prove asymptotic unbiasedness of relative-error estimation. Methodologically, we design a dedicated loss function and neural architecture that jointly enable performance quantification and end-to-end learning—supporting both comparative assessment of existing estimators and optimization of novel algorithms.
Results: Extensive experiments across multiple benchmark datasets demonstrate that our framework reliably discriminates estimator quality. The proposed learning algorithm consistently outperforms state-of-the-art methods—including R-Learner and DR-Learner—achieving statistically significant improvements. This work provides a principled, reproducible foundation for HTE model selection, validation, and development.
📝 Abstract
While significant progress has been made in heterogeneous treatment effect (HTE) estimation, the evaluation of HTE estimators remains underdeveloped. In this article, we propose a robust evaluation framework based on relative error, which quantifies performance differences between two HTE estimators. We first derive the key theoretical conditions on the nuisance parameters that are necessary to achieve a robust estimator of relative error. Building on these conditions, we introduce novel loss functions and design a neural network architecture to estimate nuisance parameters and obtain robust estimation of relative error, thereby achieving reliable evaluation of HTE estimators. We provide the large sample properties of the proposed relative error estimator. Furthermore, beyond evaluation, we propose a new learning algorithm for HTE that leverages both the previously HTE estimators and the nuisance parameters learned through our neural network architecture. Extensive experiments demonstrate that our evaluation framework supports reliable comparisons across HTE estimators, and the proposed learning algorithm for HTE exhibits desirable performance.