All Emulators are Wrong, Many are Useful, and Some are More Useful Than Others: A Reproducible Comparison of Computer Model Surrogates

📅 2025-12-09

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

This study addresses the lack of standardized, reproducible evaluation benchmarks for surrogate modeling. To this end, we introduce the first large-scale, fully open-source, automatically normalized, and cross-study reproducible surrogate model evaluation framework. It systematically assesses 29 surrogate models across 60 benchmark functions and 40 real-world datasets. Implemented in R, the framework integrates an automated simulation pipeline, input-adaptive scaling, and a standardized testing protocol; its functionality is publicly distributed via the R package *duqling*, enabling one-click fair comparison and full result reproducibility. Experimental results reveal systematic trade-offs among prediction accuracy, robustness, generalization capability, and computational efficiency across model families. These findings provide empirically grounded guidance for method developers and practitioners, along with best-practice recommendations for surrogate model selection and deployment.

Technology Category

Application Category

📝 Abstract

Accurate and efficient surrogate modeling is essential for modern computational science, and there are a staggering number of emulation methods to choose from. With new methods being developed all the time, comparing the relative strengths and weaknesses of different methods remains a challenge due to inconsistent benchmarking practices and (sometimes) limited reproducibility and transparency. In this work, we present a large-scale, fully reproducible comparison of $29$ distinct emulators across $60$ canonical test functions and $40$ real emulation datasets. To facilitate rigorous, apples-to-apples comparisons, we introduce the R package exttt{duqling}, which streamlines reproducible simulation studies using a consistent, simple syntax, and automatic internal scaling of inputs. This framework allows researchers to compare emulators in a unified environment and makes it possible to replicate or extend previous studies with minimal effort, even across different publications. Our results provide detailed empirical insight into the strengths and weaknesses of state-of-the-art emulators and offer guidance for both method developers and practitioners selecting a surrogate for new data. We discuss best practices for emulator comparison and highlight how exttt{duqling} can accelerate research in emulator design and application.

Problem

Research questions and friction points this paper is trying to address.

Compares 29 emulators across 100 test functions and datasets

Addresses inconsistent benchmarking and reproducibility in surrogate modeling

Provides guidance for selecting and developing emulators via a unified framework

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces duqling R package for reproducible emulator comparisons

Compares 29 emulators across 100 test functions and datasets

Provides unified framework with automatic scaling and simple syntax

🔎 Similar Papers

Virtuoso: Enabling Fast and Accurate Virtual Memory Research via an Imitation-based Operating System Simulation Methodology