SEMMS with Random Effects: A Mixed-Model Extension for Variable Selection in Clustered and Longitudinal Data

📅 2026-03-16

📈 Citations: 0

✨ Influential: 0

📄 PDF

🤖 AI Summary

针对聚类与纵向数据中忽略相关性导致变量选择性能下降的问题，通过引入随机效应并结合交替坐标上升算法扩展SEMMS方法。

Technology Category

Application Category

📝 Abstract

SEMMS (Scalable Empirical-Bayes Model for Marker Selection) is a variable-selection procedure for generalized linear models that uses a three-component normal mixture prior on regression coefficients. In its original form, SEMMS assumes that all observations are independent. Many real-world datasets, however, arise from repeated-measures or clustered designs in which observations within the same subject are correlated. Ignoring this correlation inflates the apparent residual variance and can severely degrade variable-selection performance. We extend SEMMS to accommodate random intercepts, random slopes, or both, via an alternating coordinate-ascent algorithm. After each round of fixed-effect variable selection, the subject-level best linear unbiased predictors (BLUPs) are updated with \texttt{lmer} (Gaussian) or \texttt{glmer} (non-Gaussian); the fixed-effect step then operates on the random-effect-adjusted response. We describe the algorithm, evaluate its performance in three Gaussian simulation studies spanning a range of signal strengths, random-effect magnitudes, and sample/predictor-space regimes, and present a semi-synthetic real-data example. We further extend the framework to non-Gaussian families (Poisson, binomial) via an IRLS working-response adaptation: at each outer iteration the fixed-effects step uses the RE-adjusted working response computed from the current \texttt{glmer} fitted values rather than the raw response. When the fixed-effect signal is strong relative to the random-effect variance, both the original and extended procedures perform comparably. When the random-effect variance dominates -- the scenario most likely to cause plain SEMMS to fail -- the mixed-model extension recovers the exact true predictor set in 93\% of simulated datasets (Gaussian), 61\% (Poisson), and 65\% (binomial), compared with 1\%, 45\%, and 39\% for plain SEMMS respectively.

Problem

Research questions and friction points this paper is trying to address.

variable selection

clustered data

longitudinal data

random effects

correlated observations

Innovation

Methods, ideas, or system contributions that make the work stand out.

mixed-effects model

variable selection

empirical Bayes

clustered data

longitudinal data

🔎 Similar Papers

No similar papers found.

Authors to Follow