🤖 AI Summary
This paper addresses the challenges of covariate effect inference and future outcome prediction in high-dimensional multivariate longitudinal data—characterized by complex dependencies among variables and over time, mixed-type outcomes (continuous and discrete), and irregular observation patterns (missingness or right-censoring). We propose a novel latent-variable modeling framework that (i) unifies the treatment of mixed-type responses and irregular temporal observations for the first time; (ii) introduces an information criterion tailored to high-dimensional longitudinal settings for automatic selection of the latent factor dimension; and (iii) establishes a rigorous central limit theorem for regression coefficient estimators, ensuring valid statistical inference. Evaluated on a customer shopping behavior prediction task, our method significantly improves long-term trend modeling accuracy and robustness of personalized forecasting, demonstrating both practical utility and theoretical soundness in real-world high-dimensional longitudinal applications.
📝 Abstract
High-dimensional multivariate longitudinal data, which arise when many outcome variables are measured repeatedly over time, are becoming increasingly common in social, behavioral and health sciences. We propose a latent variable model for drawing statistical inferences on covariate effects and predicting future outcomes based on high-dimensional multivariate longitudinal data. This model introduces unobserved factors to account for the between-variable and across-time dependence and assist the prediction. Statistical inference and prediction tools are developed under a general setting that allows outcome variables to be of mixed types and possibly unobserved for certain time points, for example, due to right censoring. A central limit theorem is established for drawing statistical inferences on regression coefficients. Additionally, an information criterion is introduced to choose the number of factors. The proposed model is applied to customer grocery shopping records to predict and understand shopping behavior.