π€ AI Summary
This work addresses the high dimensionality and multiscale nature of cold dark matter (CDM) simulation field data by proposing the first unsupervised flow-matching generative model tailored for cosmological field data, enabling scale-aware latent representation learning. Methodologically, it employs the flow-matching framework to directly model the field distribution without supervision, learning a compact (32Γ compression), semantically interpretable low-dimensional latent spaceβwhere distinct latent channels explicitly encode features at different cosmological scales. Contributions include: (i) the first application of flow matching to cosmological field modeling; (ii) end-to-end disentanglement of multiscale features with physically grounded interpretability; and (iii) support for high-fidelity field reconstruction, physically consistent synthetic data generation, and cosmological parameter inference with sub-percent accuracy.
π Abstract
Generative machine learning models have been demonstrated to be able to learn low dimensional representations of data that preserve information required for downstream tasks. In this work, we demonstrate that flow matching based generative models can learn compact, semantically rich latent representations of field level cold dark matter (CDM) simulation data without supervision. Our model, CosmoFlow, learns representations 32x smaller than the raw field data, usable for field level reconstruction, synthetic data generation, and parameter inference. Our model also learns interpretable representations, in which different latent channels correspond to features at different cosmological scales.