🤖 AI Summary
Single-cell temporal snapshot data lack individual trajectory observations, and exhibit unknown, state-dependent stochasticity—posing challenges for long-term extrapolation. Existing Schrödinger bridge methods are constrained by predefined dynamics or constant diffusion coefficients, limiting their predictive capability.
Method: We propose the first MMD-based snapshot dynamical modeling framework, introducing an MMD loss defined over the joint state-time distribution. Coupled with implicit stochastic differential equation parameterization and a robust learning scheme for incomplete observations, our approach enables end-to-end, data-driven learning of state-dependent diffusion coefficients—without assuming prior dynamical structure.
Results: The method ensures reliable inference under sparse and missing observations. Evaluated on multiple synthetic and real datasets, it significantly improves long-term extrapolation accuracy, achieves state-of-the-art or superior performance in interpolation and velocity field reconstruction, and successfully infers differentiation outcomes from early stem cell states.
📝 Abstract
Scientists often want to make predictions beyond the observed time horizon of"snapshot"data following latent stochastic dynamics. For example, in time course single-cell mRNA profiling, scientists have access to cellular transcriptional state measurements (snapshots) from different biological replicates at different time points, but they cannot access the trajectory of any one cell because measurement destroys the cell. Researchers want to forecast (e.g.) differentiation outcomes from early state measurements of stem cells. Recent Schr""odinger-bridge (SB) methods are natural for interpolating between snapshots. But past SB papers have not addressed forecasting -- likely since existing methods either (1) reduce to following pre-set reference dynamics (chosen before seeing data) or (2) require the user to choose a fixed, state-independent volatility since they minimize a Kullback-Leibler divergence. Either case can lead to poor forecasting quality. In the present work, we propose a new framework, SnapMMD, that learns dynamics by directly fitting the joint distribution of both state measurements and observation time with a maximum mean discrepancy (MMD) loss. Unlike past work, our method allows us to infer unknown and state-dependent volatilities from the observed data. We show in a variety of real and synthetic experiments that our method delivers accurate forecasts. Moreover, our approach allows us to learn in the presence of incomplete state measurements and yields an $R^2$-style statistic that diagnoses fit. We also find that our method's performance at interpolation (and general velocity-field reconstruction) is at least as good as (and often better than) state-of-the-art in almost all of our experiments.