Log-Ratio Propagation on the Simplex: A Theory of Cellwise Contamination for Compositional Data

📅 2026-05-29

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

This study addresses the ill-posedness of conventional Euclidean robust methods on the simplex when a single compositional component is contaminated, which induces a global shift in log-ratio coordinates. Building upon a scale-invariant multiplicative perturbation model, the work establishes the first theoretical framework for cellwise contamination in compositional data, proving that contamination in one component manifests as a rank-one shift in log-ratio space. The authors introduce a contamination propagation theorem and an influence function–based diagnostic fingerprint. By leveraging centered log-ratio transformation, isometric log-ratio coordinates, and contrast matrix analysis, they quantify the cellwise breakdown points of MCD, S-, τ-, and coordinatewise M-estimators, revealing a reduction by a factor of $(D-1)/D$ compared to their Euclidean counterparts. Furthermore, they demonstrate that the variability matrix’s influence function precisely identifies contaminated components, thereby laying the theoretical foundation for cellwise robust analysis of compositional data.

📝 Abstract

Compositional data must be analysed through log-ratios: scale invariance, the defining axiom of the field, leaves no alternative. The centred log-ratio divides by the geometric mean of every part, so a single contaminated component shifts every centred-log-ratio coordinate at once, displacing the log-ratio vector by a fixed amount that no choice of coordinates can reduce. We develop a theory of cellwise contamination on the simplex around this observation. A scale-invariant contamination model built from multiplicative perturbation combines with a propagation theorem showing that corruption of a single raw part induces a rank-one shift of the log-ratio vector, with direction determined by the contrast matrix. The resulting perturbation pattern is not equivalent to any independent cellwise contamination model in log-ratio coordinates -- so standard Euclidean cellwise methods applied to log-ratios are ill-posed under the simplex contamination mechanism. For estimators whose Euclidean cellwise breakdown is witnessed by a column-concentrated configuration -- a class including MCD, $S$-, $τ$-, and coordinate-wise $M$-estimators of location and scatter -- the cellwise breakdown value on the simplex is reduced by the factor $(D-1)/D$ relative to its Euclidean counterpart, a reduction that is tight and arises purely from the normalisation mismatch between $nD$ raw cells and $n(D-1)$ ilr cells. The cellwise influence function for the variation matrix carries a diagnostic fingerprint: contamination of a single part inflates exactly one row and column, identifying the responsible component. These results form the theoretical foundation for cellwise-robust methods on the simplex; a companion paper develops a cellwise-robust PCA estimator that exploits the propagation geometry and demonstrates it on simulated and geochemical data.

Problem

Research questions and friction points this paper is trying to address.

compositional data

cellwise contamination

log-ratio transformation

simplex

robust estimation

Innovation

Methods, ideas, or system contributions that make the work stand out.

compositional data

cellwise contamination

log-ratio transformation