Learning From Simulators: A Theory of Simulation-Grounded Learning

📅 2025-09-23

📈 Citations: 0

✨ Influential: 0

career value

211K/year

🤖 AI Summary

This paper addresses the lack of theoretical foundations for Simulation-Grounded Neural Networks (SGNNs), particularly in scientific modeling scenarios where ground-truth labels are unavailable or scarce. Methodologically, it introduces a simulator-driven learning framework that generates synthetic data from mechanistic simulators and integrates Bayesian inference approximation with neural network amortization. Theoretically, it rigorously proves that SGNNs converge to the Bayes-optimal predictor under mild regularity conditions. The approach enables identification and prediction of unobserved latent variables and ensures posterior-consistent scientific interpretability via simulator-based mechanistic attribution. Experiments demonstrate that the model accurately recovers latent parameters, achieves half the error rate of AIC on dynamical mechanism discrimination tasks, and exhibits strong robustness under model misspecification—significantly outperforming conventional statistical methods.

Technology Category

Application Category

📝 Abstract

Simulation-Grounded Neural Networks (SGNNs) are predictive models trained entirely on synthetic data from mechanistic simulations. They have achieved state-of-the-art performance in domains where real-world labels are limited or unobserved, but lack a formal underpinning. We present the foundational theory of simulation-grounded learning. We show that SGNNs implement amortized Bayesian inference under a simulation prior and converge to the Bayes-optimal predictor. We derive generalization bounds under model misspecification and prove that SGNNs can learn unobservable scientific quantities that empirical methods provably cannot. We also formalize a novel form of mechanistic interpretability uniquely enabled by SGNNs: by attributing predictions to the simulated mechanisms that generated them, SGNNs yield posterior-consistent, scientifically grounded explanations. We provide numerical experiments to validate all theoretical predictions. SGNNs recover latent parameters, remain robust under mismatch, and outperform classical tools: in a model selection task, SGNNs achieve half the error of AIC in distinguishing mechanistic dynamics. These results establish SGNNs as a principled and practical framework for scientific prediction in data-limited regimes.

Problem

Research questions and friction points this paper is trying to address.

Developing theoretical foundations for simulation-trained neural networks lacking formal underpinning

Proving SGNNs can learn unobservable scientific quantities empirical methods cannot

Establishing SGNNs as principled framework for scientific prediction with limited data

Innovation

Methods, ideas, or system contributions that make the work stand out.

SGNNs implement amortized Bayesian inference

SGNNs converge to the Bayes-optimal predictor

SGNNs enable mechanistic interpretability via simulation attributions

🔎 Similar Papers

No similar papers found.