🤖 AI Summary
Traditional Gaussian processes struggle to model stochastic simulators with discrete, heteroscedastic, or non-Gaussian outputs. This work proposes a scalable Generalized Deep Gaussian Process (GDGP) framework that unifies the treatment of diverse non-Gaussian responses—such as Poisson, negative binomial, and categorical—through a latent Gaussian process structure. By integrating the Vecchia approximation, the approach enables efficient Bayesian inference for large-scale inputs and replicated simulations. GDGP represents the first extension of deep Gaussian processes to general non-Gaussian output settings, substantially enhancing the ability to model complex, non-stationary simulation systems. Empirical evaluations on both synthetic and real-world case studies demonstrate the superior performance of GDGP, and the accompanying R package dgpsi is publicly released to facilitate broader application.
📝 Abstract
Gaussian process (GP) emulators have become essential tools for approximating complex simulators, significantly reducing computational demands in optimization, sensitivity analysis, and model calibration. While traditional GP emulators effectively model continuous and Gaussian-distributed simulator outputs with homogeneous variability, they typically struggle with discrete, heteroskedastic Gaussian, or non-Gaussian data, limiting their applicability to increasingly common stochastic simulators. In this work, we introduce a scalable Generalized Deep Gaussian Process (GDGP) emulation framework designed to accommodate simulators with heteroskedastic Gaussian outputs and a wide range of non-Gaussian response distributions, including Poisson, negative binomial, and categorical distributions. The GDGP framework leverages the expressiveness of DGPs and extends them to latent GP structures, enabling it to capture the complex, non-stationary behavior inherent in many simulators while also modeling non-Gaussian simulator outputs. We make GDGP scalable by incorporating the Vecchia approximation for settings with a large number of input locations, while also developing efficient inference procedures for handling large numbers of replicates. In particular, we present methodological developments that further enhance the computation of the approach for heteroskedastic Gaussian responses. We demonstrate through a series of synthetic and empirical examples that these extensions deliver the practical application of GDGP emulators and a unified methodology capable of addressing diverse modeling challenges. The proposed GDGP framework is implemented in the open-source R package dgpsi.