🤖 AI Summary
In digital psychiatry research, imputing missing values in nonstationary multivariate time series—such as ecological momentary assessment (EMA) data collected via mobile devices—remains challenging, as existing methods assume stationarity or neglect time-varying confounding and dynamic interdependencies among heterogeneous variables. To address this, we propose MCEM-SSM, the first Monte Carlo Expectation-Maximization state-space model tailored for nonstationary, entangled multivariate time series. It jointly models missingness mechanisms, time-varying covariate dependencies, and confounder adjustment within a unified probabilistic framework. Integrating nonstationary time-series modeling, Bayesian inference, and simulation-based validation, MCEM-SSM significantly outperforms state-of-the-art imputation methods on multi-year real-world smartphone data from bipolar disorder and schizophrenia patients, as well as on synthetic benchmarks. Furthermore, it successfully uncovers dynamic associations—e.g., between digital social connectivity and negative affect—establishing a novel paradigm for causal inference in digital psychiatry. (149 words)
📝 Abstract
Mobile technology (e.g., mobile phones and wearable devices) provides scalable methods for collecting physiological and behavioral biomarkers in patients' naturalistic settings, as well as opportunities for therapeutic advancements and scientific discoveries regarding the etiology of psychiatric illness. Continuous data collection through mobile devices generates highly complex data: entangled multivariate time series of outcomes, exposures, and covariates. Missing data is a pervasive problem in biomedical and social science research, and Ecological Momentary Assessment (EMA) data in psychiatric research is no exception. However, the complex data structure of multivariate time series and their non-stationary nature make missing data a major challenge for proper inference. Additional historical information included in time series analyses exacerbates the issue of missing data and also introduces problems for confounding adjustment. The majority of existing imputation methods are either designed for stationary time series or for longitudinal data with limited follow-up periods. The limited work on non-stationary time series either focuses on missing exogenous information or ignores the complex temporal dependence among outcomes, exposures, and covariates. We propose a Monte Carlo Expectation Maximization algorithm for the state space model (MCEM-SSM) to effectively handle missing data in non-stationary entangled multivariate time series. We demonstrate the method's advantages over other widely used missing data imputation strategies through simulations of both stationary and non-stationary time series, subject to various missing mechanisms. Finally, we apply the MCEM-SSM to a multi-year smartphone observational study of bipolar and schizophrenia patients to investigate the association between digital social connectivity and negative mood.