Bayesian Variable Selection for Censored Spatial Responses with Application to PFAS Concentrations in California

📅 2025-10-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address left-censoring, strong spatial dependence, and high-dimensional confounding in PFAS concentration data from California groundwater, this paper proposes a Bayesian hierarchical model that jointly models the censoring mechanism and spatial process. Methodologically, it approximates spatial correlation via a Gaussian process and employs a global-local shrinkage prior for high-dimensional variable selection. Three post-selection strategies are systematically compared to balance censoring handling, predictive accuracy, and variable selection stability. Interpretability and robustness of inference are enhanced through credible interval rules, shrinkage weight thresholds, and cluster inclusion methods. Empirically, the model identifies key drivers—including demographic composition, number of industrial facilities, distance to airports, traffic density, and herbaceous land cover—with substantially improved prediction accuracy and selection stability over competing approaches.

Technology Category

Application Category

📝 Abstract
Per- and polyfluoroalkyl substances (PFAS) are persistent environmental pollutants of major public health concern due to their resistance to degradation, widespread presence, and potential health risks. Analyzing PFAS in groundwater is challenging due to left-censoring and strong spatial dependence. Although PFAS levels are influenced by sociodemographic, industrial, and environmental factors, the relative importance of these drivers remains unclear, highlighting the need for robust statistical tools to identify key predictors from a large candidate set. We present a Bayesian hierarchical framework that integrates censoring into a spatial process model via approximate Gaussian processes and employs a global-local shrinkage prior for high-dimensional variable selection. We evaluate three post-selection strategies, namely, credible interval rules, shrinkage weight thresholds, and clustering-based inclusion and compare their performance in terms of predictive accuracy, censoring robustness, and variable selection stability through cross-validation. Applied to PFOS concentrations in California groundwater, the model identifies a concise, interpretable set of predictors, including demographic composition, industrial facility counts, proximity to airports, traffic density, and environmental features such as herbaceous cover and elevation. These findings demonstrate that the proposed approach delivers stable, interpretable inference in censored, spatial, high-dimensional contexts, thereby offering actionable insights into the environmental and industrial factors affecting PFAS concentrations.
Problem

Research questions and friction points this paper is trying to address.

Identifying key predictors of PFAS concentrations from many candidates
Handling left-censored data with strong spatial dependence in groundwater
Developing robust Bayesian methods for high-dimensional variable selection
Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian hierarchical framework integrates censoring into spatial models
Uses global-local shrinkage prior for high-dimensional variable selection
Evaluates three post-selection strategies for predictive accuracy
🔎 Similar Papers
No similar papers found.