Perturbation-based Effect Measures for Compositional Data

📅 2023-11-30
📈 Citations: 3
Influential: 0
📄 PDF
🤖 AI Summary
Causal effect estimation in high-dimensional, sparse compositional data—such as microbiome abundances—is challenging: standard parametric models struggle to respect the simplex constraint, and unbiased estimation of aggregate statistics (e.g., diversity indices) on response variables remains elusive. Method: We propose the Average Perturbation Effect (APE) framework, which defines interpretable statistical functionals directly on the simplex. By modeling perturbations under a reparameterization that accounts for perturbation-dependent confounding, APE inherently adjusts for confounding bias, yielding unbiased and identifiable causal effects. Unlike marginal analyses, APE circumvents inherent bias induced by compositional constraints. Contribution/Results: Integrated with semiparametric efficient estimation (e.g., doubly robust methods), APE outperforms existing approaches in simulations and semi-synthetic studies. It is successfully applied to real-world problems—including the association between racial diversity and academic performance, and microbiome–host phenotype relationships—demonstrating improved estimation stability and enhanced causal interpretability.
📝 Abstract
Existing effect measures for compositional features are inadequate for many modern applications, for example, in microbiome research, since they display traits such as high-dimensionality and sparsity that can be poorly modelled with traditional parametric approaches. Further, assessing -- in an unbiased way -- how summary statistics of a composition (e.g., racial diversity) affect a response variable is not straightforward. We propose a framework based on hypothetical data perturbations which defines interpretable statistical functionals on the compositions themselves, which we call average perturbation effects. These effects naturally account for confounding that biases frequently used marginal dependence analyses. We show how average perturbation effects can be estimated efficiently by deriving a perturbation-dependent reparametrization and applying semiparametric estimation techniques. We analyze the proposed estimators empirically on simulated and semi-synthetic data and demonstrate advantages over existing techniques on data from New York schools and microbiome data.
Problem

Research questions and friction points this paper is trying to address.

Inadequate effect measures for high-dimensional sparse compositional data
Unbiased assessment of composition summary statistics on response variables
Confounding bias in marginal dependence analyses for compositional features
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hypothetical data perturbations framework
Perturbation-dependent reparametrization technique
Semiparametric estimation for confounding adjustment
A
A. Lundborg
Department of Mathematical Sciences, University of Copenhagen, Denmark
Niklas Pfister
Niklas Pfister
Associate Professor, University of Copenhagen