Estimating the Wasserstein barycenter of one-dimensional distributions under sparse sampling

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of estimating the Wasserstein barycenter under sparse sampling, where each individual is observed only a few i.i.d. one-dimensional samples, leading to substantial bias in conventional empirical Wasserstein barycenter estimators. To overcome this limitation, the authors propose the Marginal Construction Barycenter (MCB) estimator, which innovatively models the distribution of latent individual quantiles and indirectly infers the barycenter through the marginal distribution of cumulative distribution function (CDF) values, thereby avoiding direct reconstruction of individual or population distributions. The method employs a binomial mixture model to estimate the marginal CDF distribution and defines the Wasserstein barycenter as the mean of the derived quantile distribution. Theoretical analysis establishes pointwise consistency and asymptotic normality of the MCB estimator. Simulations demonstrate its superior performance over existing approaches in sparse settings, and its practical utility is validated through application to HIV-1 sequence data analysis.
📝 Abstract
We study distributional data under sparse sampling where each unit is represented by a probability distribution on the real line observed only through a small i.i.d.~sample. A natural notion of central tendency for one-dimensional distributional data is the Wasserstein barycenter, whose quantile function is the pointwise average of the unit-level quantile functions. We focus on pointwise estimation of the Wasserstein barycenter quantile function: at a given quantile level, the target is the population mean of the corresponding unit-level quantiles. A naive plug-in estimator is the empirical Wasserstein barycenter, which treats observed unit-level empirical distributions as the true latent unit-level distributions. Under sparse sampling, however, this estimator can be severely biased. We propose an approach that avoids directly estimating either the unit-level distributions or the full population law of distributions. We start with the more ambitious goal of characterizing the distribution of latent unit-level quantiles at a given quantile level. We show that this distribution can be written in terms of the marginal distributions of the unit-level CDF values, which can be estimated using binomial mixture methods. This motivates our estimator, the marginal-constructed barycenter (MCB) estimator, obtained by taking the mean of the estimated distribution of latent unit-level quantiles. We establish conditions under which the MCB estimator is pointwise consistent and asymptotically normal, and show through simulations that it can substantially outperform the empirical Wasserstein barycenter under sparse sampling. We illustrate the method in an analysis of HIV-1 sequence data from the HVTN 502/503 vaccine efficacy trials, using the barycenter to summarize and compare within-participant distributions of viral sequence features when only a small number of sequences are available per participant.
Problem

Research questions and friction points this paper is trying to address.

Wasserstein barycenter
sparse sampling
distributional data
quantile function
one-dimensional distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Wasserstein barycenter
sparse sampling
quantile function
marginal-constructed barycenter
binomial mixture
🔎 Similar Papers
No similar papers found.