🤖 AI Summary
Existing causal mediation analysis methods struggle with the zero-inflated nature of single-cell data and often rely on strong distributional assumptions. This work proposes QuasiMed, a framework for efficient mediation effect inference through a three-step procedure: first, candidate mediators are selected by integrating penalized regression with marginal models; second, indirect effects are estimated using both average expression levels and the proportion of expressing cells; and third, multiple testing correction is applied to control false positives. By modeling only the mean function of the mediator model, QuasiMed avoids stringent distributional assumptions, making it well-suited for zero-inflated single-cell data. Simulations demonstrate that QuasiMed achieves superior performance in statistical power, false discovery rate control, and computational efficiency. The method was successfully applied to ROSMAP single-cell data, uncovering potential causal pathways.
📝 Abstract
Recent advances in single-cell technologies have advanced our understanding of gene regulation and cellular heterogeneity at single-cell resolution. Single-cell data contain both gene expression levels and the proportion of expressing cells, which makes them structurally different from bulk data. Currently, methodological work on causal mediation analysis for single-cell data remains limited and often requires specific distributional assumptions. To address this challenge, we present QuasiMed, a mediation framework specialized for single-cell data. Our proposed method comprises three steps, including (i) screening mediator candidates through penalized regression and marginal models (similar to sure independence screening), (ii) estimation of indirect effects through the average expression and the proportion of expressing cells, (iii) and hypothesis testing with multiplicity control. The key benefit of QuasiMed is that it specifies only the mean functions of the mediation models through a quasi-regression framework, thereby relaxing strict distributional assumptions. The method performance was evaluated through the real-data-inspired simulations, and demonstrated high power, false discovery rate control, and computational efficiency. Lastly, we applied QuasiMed to ROSMAP single-cell data to illustrate its potential to identify mediating causal pathways. R package is freely available on GitHub repository at https://github.com/sjahnn/QuasiMed.