🤖 AI Summary
This paper addresses estimation bias induced by preferential sampling in geostatistics, investigating conditions under which ignoring the sampling mechanism remains valid within the Diggle et al. (2010) framework. Moving beyond computationally intensive and model-misspecification-sensitive likelihood-based approaches, we derive—through theoretical analysis and simulation studies—sufficient conditions for non-likelihood estimators (e.g., weighted least squares, method-of-moments estimators) to retain unbiasedness and consistency: specifically, when covariates are orthogonal to spatial residuals and satisfy certain mixing conditions. Under these conditions, explicit modeling of the sampling mechanism becomes unnecessary for robust inference. The proposed approach substantially reduces computational cost, improves confidence interval coverage, and demonstrates empirical validity on tropical forest carbon stock data. It establishes a new paradigm for lightweight, model-agnostic estimation under preferential sampling.
📝 Abstract
Preferential sampling has attracted considerable attention in geostatistics since the pioneering work of Diggle et al. (2010). A variety of likelihood-based approaches have been developed to correct estimation bias by explicitly modelling the sampling mechanism. While effective in many applications, these methods are often computationally expensive and can be susceptible to model misspecification. In this paper, we present a surprising finding: some existing non-likelihood-based methods that ignore preferential sampling can still produce unbiased and consistent estimators under the widely used framework of Diggle et al. (2010) and its extensions. We investigate the conditions under which preferential sampling can be ignored and develop relevant estimators for both regression and covariance parameters without specifying the sampling mechanism parametrically. Simulation studies demonstrate clear advantages of our approach, including reduced estimation error, improved confidence interval coverage, and substantially lower computational cost. To show the practical utility, we further apply it to a tropical forest data set.