🤖 AI Summary
High-dimensional dynamical systems are often governed by effective variables residing on low-dimensional manifolds; however, simultaneous discovery of these variables and learning of governing equations has long hindered efficient coarse-grained modeling. This paper proposes a decoupled learning framework: first, a pre-trained foundation inference model (FIM) directly estimates the system’s generator—i.e., drift and diffusion terms—in a zero-shot manner, with its weights frozen to disentangle dynamic inference from representation learning; second, an encoder–decoder architecture learns the latent variable mapping, jointly optimized via a simulation-consistency loss. By circumventing traditional iterative fitting, the method significantly improves training stability and generalization. Evaluated on a stochastic double-well system with semicircular diffusion, the approach achieves high-fidelity low-dimensional representation and dynamics reconstruction using only encoder–decoder learning—bypassing explicit generator retraining—and substantially accelerates coarse-graining.
📝 Abstract
High-dimensional recordings of dynamical processes are often characterized by a much smaller set of effective variables, evolving on low-dimensional manifolds. Identifying these latent dynamics requires solving two intertwined problems: discovering appropriate coarse-grained variables and simultaneously fitting the governing equations. Most machine learning approaches tackle these tasks jointly by training autoencoders together with models that enforce dynamical consistency. We propose to decouple the two problems by leveraging the recently introduced Foundation Inference Models (FIMs). FIMs are pretrained models that estimate the infinitesimal generators of dynamical systems (e.g., the drift and diffusion of a stochastic differential equation) in zero-shot mode. By amortizing the inference of the dynamics through a FIM with frozen weights, and training only the encoder-decoder map, we define a simple, simulation-consistent loss that stabilizes representation learning. A proof of concept on a stochastic double-well system with semicircle diffusion, embedded into synthetic video data, illustrates the potential of this approach for fast and reusable coarse-graining pipelines.