Data-Efficient Generative Modeling of Non-Gaussian Global Climate Fields via Scalable Composite Transformations

📅 2026-02-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of scarce training data in climate modeling—stemming from the high computational cost of climate simulations—which hinders accurate quantification of predictive uncertainty. To overcome this, the authors propose a scalable composite transformation framework that maps non-Gaussian global climate fields to a multivariate standard normal space via nonparametric Bayesian transport maps. The approach jointly models spatial dependence and heavy-tailed marginal distributions by combining parametric marginal models with a semi-parametric B-spline correction. By incorporating a low-rank Gaussian process prior, the method achieves linear computational complexity. Remarkably, using only 10 samples, it accurately reproduces both spatial structure and marginal characteristics across more than 50,000 grid points of global log-precipitation rates, outperforming state-of-the-art methods while using just one-eighth of their training data, thereby substantially improving data efficiency.

Technology Category

Application Category

📝 Abstract
Quantifying uncertainty in future climate projections is hindered by the prohibitive computational cost of running physical climate models, which severely limits the availability of training data. We propose a data-efficient framework for emulating the internal variability of global climate fields, specifically designed to overcome these sample-size constraints. Inspired by copula modeling, our approach constructs a highly expressive joint distribution via a composite transformation to a multivariate standard normal space. We combine a nonparametric Bayesian transport map for spatial dependence modeling with flexible, spatially varying marginal models, essential for capturing non-Gaussian behavior and heavy-tailed extremes. These marginals are defined by a parametric model followed by a semi-parametric B-spline correction to capture complex distributional features. The marginal parameters are spatially smoothed using Gaussian-process priors with low-rank approximations, rendering the computational cost linear in the spatial dimension. When applied to global log-precipitation-rate fields at more than 50,000 grid locations, our stochastic surrogate achieves high fidelity, accurately quantifying the climate distribution's spatial dependence and marginal characteristics, including the tails. Using only 10 training samples, it outperforms a state-of-the-art competitor trained on 80 samples, effectively octupling the computational budget for climate research. We provide a Python implementation at https://github.com/jobrachem/ppptm .
Problem

Research questions and friction points this paper is trying to address.

data efficiency
climate modeling
non-Gaussian
uncertainty quantification
extreme events
Innovation

Methods, ideas, or system contributions that make the work stand out.

data-efficient generative modeling
composite transformation
non-Gaussian climate fields
Bayesian transport map
spatially varying marginals