🤖 AI Summary
Sparse sampling in national forest inventories (NFIs) leads to low accuracy in small-area (e.g., county-level) forest biomass estimation. Method: We propose and evaluate a two-stage, unit-level Bayesian hierarchical model that jointly accommodates zero-inflation and positive skewness by integrating a zero-inflated Gamma–Poisson mixture structure with spatial random effects. Contribution/Results: This work presents the first systematic comparison of zero-inflated two-stage Bayesian models against single-stage and frequentist alternatives for small-area estimation. We introduce unit-level cross-validation—replacing conventional simulation-based validation—to enhance model selection efficiency and computational economy. Empirical evaluation across Washington and Nevada demonstrates that our model reduces county-level biomass estimation RMSE by 23% on average. Moreover, unit-level cross-validation yields results highly consistent with simulation-based validation, confirming the model’s reliability and practical utility.
📝 Abstract
National Forest Inventory (NFI) data are typically limited to sparse networks of sample locations due to cost constraints. While traditional design-based estimators provide reliable forest parameter estimates for large areas, there is increasing interest in model-based small area estimation (SAE) methods to improve precision for smaller spatial, temporal, or biophysical domains. SAE methods can be broadly categorized into area- and unit-level models, with unit-level models offering greater flexibility -- making them the focus of this study. Ensuring valid inference requires satisfying model distributional assumptions, which is particularly challenging for NFI variables that exhibit positive support and zero inflation, such as forest biomass, carbon, and volume. Here, we evaluate a class of two-stage unit-level hierarchical Bayesian models for estimating forest biomass at the county-level in Washington and Nevada, United States. We compare these models to simpler Bayesian single-stage and two-stage frequentist approaches. To assess estimator performance, we employ simulated populations and cross-validation techniques. Results indicate that small area estimators that incorporate a two-stage approach to account for zero inflation, county-specific random intercepts and residual variances, and spatial random effects provide the most reliable county-level estimates. Additionally, findings suggest that unit-level cross-validation within the training dataset is as effective as area-level validation using simulated populations for model selection. We also illustrate the usefulness of simulated populations for better assessing qualities of the various estimators considered.