🤖 AI Summary
To address the high computational cost of high-fidelity numerical simulations in scientific machine learning, this work investigates the computational budget–performance trade-off for neural surrogate models trained on multi-fidelity data. Focusing on Reynolds-Averaged Navier–Stokes (RANS) turbulence simulations, we construct a hybrid dataset comprising low-fidelity (low-order) and high-fidelity (high-order) simulations, and propose and empirically validate the first **computationally budget-decomposed multi-fidelity scaling law**. Through systematic experiments varying the allocation of fixed total computational budgets between fidelity levels—and subsequent performance fitting—we identify an optimal low-to-high-fidelity data ratio that significantly improves both predictive accuracy and training efficiency. This study provides the first empirical characterization of scaling behavior in multi-fidelity neural surrogates, yielding a quantifiable, generalizable theoretical framework and practical guidelines for efficient data generation and modeling in scientific computing.
📝 Abstract
Scaling laws describe how model performance grows with data, parameters and compute. While large datasets can usually be collected at relatively low cost in domains such as language or vision, scientific machine learning is often limited by the high expense of generating training data through numerical simulations. However, by adjusting modeling assumptions and approximations, simulation fidelity can be traded for computational cost, an aspect absent in other domains. We investigate this trade-off between data fidelity and cost in neural surrogates using low- and high-fidelity Reynolds-Averaged Navier-Stokes (RANS) simulations. Reformulating classical scaling laws, we decompose the dataset axis into compute budget and dataset composition. Our experiments reveal compute-performance scaling behavior and exhibit budget-dependent optimal fidelity mixes for the given dataset configuration. These findings provide the first study of empirical scaling laws for multi-fidelity neural surrogate datasets and offer practical considerations for compute-efficient dataset generation in scientific machine learning.