🤖 AI Summary
This study addresses the ill-posedness of conventional Euclidean robust methods on the simplex when a single compositional component is contaminated, which induces a global shift in log-ratio coordinates. Building upon a scale-invariant multiplicative perturbation model, the work establishes the first theoretical framework for cellwise contamination in compositional data, proving that contamination in one component manifests as a rank-one shift in log-ratio space. The authors introduce a contamination propagation theorem and an influence function–based diagnostic fingerprint. By leveraging centered log-ratio transformation, isometric log-ratio coordinates, and contrast matrix analysis, they quantify the cellwise breakdown points of MCD, S-, τ-, and coordinatewise M-estimators, revealing a reduction by a factor of $(D-1)/D$ compared to their Euclidean counterparts. Furthermore, they demonstrate that the variability matrix’s influence function precisely identifies contaminated components, thereby laying the theoretical foundation for cellwise robust analysis of compositional data.
📝 Abstract
Compositional data must be analysed through log-ratios: scale invariance, the defining axiom of the field, leaves no alternative. The centred log-ratio divides by the geometric mean of every part, so a single contaminated component shifts every centred-log-ratio coordinate at once, displacing the log-ratio vector by a fixed amount that no choice of coordinates can reduce. We develop a theory of cellwise contamination on the simplex around this observation. A scale-invariant contamination model built from multiplicative perturbation combines with a propagation theorem showing that corruption of a single raw part induces a rank-one shift of the log-ratio vector, with direction determined by the contrast matrix. The resulting perturbation pattern is not equivalent to any independent cellwise contamination model in log-ratio coordinates -- so standard Euclidean cellwise methods applied to log-ratios are ill-posed under the simplex contamination mechanism. For estimators whose Euclidean cellwise breakdown is witnessed by a column-concentrated configuration -- a class including MCD, $S$-, $τ$-, and coordinate-wise $M$-estimators of location and scatter -- the cellwise breakdown value on the simplex is reduced by the factor $(D-1)/D$ relative to its Euclidean counterpart, a reduction that is tight and arises purely from the normalisation mismatch between $nD$ raw cells and $n(D-1)$ ilr cells. The cellwise influence function for the variation matrix carries a diagnostic fingerprint: contamination of a single part inflates exactly one row and column, identifying the responsible component. These results form the theoretical foundation for cellwise-robust methods on the simplex; a companion paper develops a cellwise-robust PCA estimator that exploits the propagation geometry and demonstrates it on simulated and geochemical data.