🤖 AI Summary
Addressing the challenges of detecting model misspecification in high-dimensional cosmological data and balancing information preservation with dimensionality reduction, this paper proposes a novel Bayesian evidence estimation framework integrating scale-dependent neural summary statistics with normalizing flows. The method introduces a scale-aware neural network architecture for joint data compression and model evidence estimation across multiple physical scales, enabling localization of model failures at specific scales. Validated on CAMELS simulations of matter and gas density fields under varying subgrid physics prescriptions, it demonstrates high sensitivity to model misspecification and interpretable diagnostic capability. Key contributions are: (1) the first scale-resolved neural summary statistics; (2) end-to-end, differentiable Bayesian model validation; and (3) simultaneous preservation of information fidelity and low-dimensional representation, significantly enhancing both diagnostic accuracy and physical interpretability of cosmological models.
📝 Abstract
Current and upcoming cosmological surveys will produce unprecedented amounts of high-dimensional data, which require complex high-fidelity forward simulations to accurately model both physical processes and systematic effects which describe the data generation process. However, validating whether our theoretical models accurately describe the observed datasets remains a fundamental challenge. An additional complexity to this task comes from choosing appropriate representations of the data which retain all the relevant cosmological information, while reducing the dimensionality of the original dataset. In this work we present a novel framework combining scale-dependent neural summary statistics with normalizing flows to detect model misspecification in cosmological simulations through Bayesian evidence estimation. By conditioning our neural network models for data compression and evidence estimation on the smoothing scale, we systematically identify where theoretical models break down in a data-driven manner. We demonstrate a first application to our approach using matter and gas density fields from three CAMELS simulation suites with different subgrid physics implementations.