Learning Collapsed Patterns in Compositional Data: A Bayesian Heterogeneous Relative-Shift Approach

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

High-dimensional compositional data often exhibit both latent heterogeneous subpopulations and sparse effect structures, yet existing methods struggle to simultaneously perform clustering and within-cluster dimension reduction. This work proposes a Bayesian heterogeneous relative shift regression model that precisely ties cluster-specific coefficients through a projection-shrinkage prior defined on an identifiable contrast space, while incorporating a finite mixture prior to automatically infer the number of clusters. We develop a hybrid MCMC algorithm combining deterministic collapsing operators with the No-U-Turn Sampler (NUTS) for efficient posterior sampling. Theoretical results establish posterior consistency for both the latent partition and cluster-specific effect structures. Comprehensive simulations and real-data analyses demonstrate the method’s superior performance in estimation accuracy, predictive capability, and interpretability.

📝 Abstract

Relative-shift regression provides a principled framework for modeling compositional covariates by quantifying how the response changes when mass is reallocated from one component to another. Yet many emerging compositional data problems extend beyond this classical setting, involving high-dimensional predictors and regression effects that vary across latent subpopulations. This complexity poses a dual challenge unmet by existing methods: recovering latent cluster structure while simultaneously achieving dimension reduction within each cluster. We propose a Bayesian heterogeneous relative-shift regression model that jointly learns latent clusters and parsimonious effect structures. Methodologically, we combine a projection-based shrinkage prior on identifiable contrasts, which induces exact coefficient ties within mixture components, with a mixture of finite mixtures prior that infers the number of clusters. Computationally, we develop a scalable hybrid MCMC algorithm that embeds a deterministic surrogate collapse operator within NUTS. Theoretically, we establish posterior consistency for both the latent partition and cluster-specific effect structures. Simulations confirm accurate recovery and strong predictive performance, and applications to cross-country macroeconomic data and spatial transcriptomics demonstrate the method's interpretability and practical utility.

Problem

Research questions and friction points this paper is trying to address.

compositional data

latent clusters

dimension reduction

heterogeneous effects

relative-shift regression

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian heterogeneous modeling

relative-shift regression

projection-based shrinkage prior