SimulCost: A Cost-Aware Benchmark and Toolkit for Automating Physics Simulations with LLMs

πŸ“… 2026-03-11
πŸ“ˆ Citations: 1
✨ Influential: 0
πŸ“„ PDF

career value

179K/year
πŸ€– AI Summary
This work addresses a critical gap in evaluating large language models (LLMs) for scientific tasks: the neglect of simulation time and computational resource costs, which renders conventional metrics ineffective under real-world budget constraints. The authors introduce SimulCost, the first cost-sensitive benchmark tailored for physics simulations, featuring a platform-agnostic definition of simulation cost that spans 13 simulators across fluid dynamics, solid mechanics, and plasma physics. Through systematic evaluation of LLMs in both single-shot initial guess generation and multi-round parameter tuning, they find single-round success rates of 46–65% (35–55% under high-precision requirements), improving to 72–81% with iterative refinementβ€”albeit at 1.5–2.5Γ— the computational cost of traditional parameter scans. The project provides an open-source, extensible benchmark and toolkit to advance research on cost-aware AI agents.
πŸ“ Abstract
Evaluating LLM agents for scientific tasks has focused on token costs while ignoring tool-use costs like simulation time and experimental resources. As a result, metrics like pass@k become impractical under realistic budget constraints. To address this gap, we introduce SimulCost, the first benchmark targeting cost-sensitive parameter tuning in physics simulations. SimulCost compares LLM tuning cost-sensitive parameters against traditional scanning approach in both accuracy and computational cost, spanning 2,916 single-round (initial guess) and 1,900 multi-round (adjustment by trial-and-error) tasks across 12 simulators from fluid dynamics, solid mechanics, and plasma physics. Each simulator's cost is analytically defined and platform-independent. Frontier LLMs achieve 46--64% success rates in single-round mode, dropping to 35--54% under high accuracy requirements, rendering their initial guesses unreliable especially for high accuracy tasks. Multi-round mode improves rates to 71--80%, but LLMs are 1.5--2.5x slower than traditional scanning, making them uneconomical choices. We also investigate parameter group correlations for knowledge transfer potential, and the impact of in-context examples and reasoning effort, providing practical implications for deployment and fine-tuning. We open-source SimulCost as a static benchmark and extensible toolkit to facilitate research on improving cost-aware agentic designs for physics simulations, and for expanding new simulation environments. Code and data are available at https://github.com/Rose-STL-Lab/SimulCost-Bench.
Problem

Research questions and friction points this paper is trying to address.

cost-aware evaluation
physics simulations
LLM agents
tool-use cost
parameter tuning
Innovation

Methods, ideas, or system contributions that make the work stand out.

cost-aware simulation
LLM agents
physics simulation benchmark
parameter tuning
tool-use cost
πŸ”Ž Similar Papers
No similar papers found.
Yadi Cao
Yadi Cao
University of California San Diego
Scientific Machine LearningNumerical PDEsComputational MechanicsFluid Dynamics
S
Sicheng Lai
The Chinese University of Hong Kong, Shenzhen
J
Jiahe Huang
University of California San Diego
Y
Yang Zhang
Peking University
Z
Zach Lawrence
University of California San Diego
R
Rohan Bhakta
University of California San Diego
I
Izzy F. Thomas
University of California San Diego
M
Mingyun Cao
University of California, Los Angeles
C
Chung-Hao Tsai
University of California San Diego
Z
Zihao Zhou
University of California San Diego
Y
Yidong Zhao
ETH Zurich
Hao Liu
Hao Liu
California Institute of Technology
Machine Learning
Alessandro Marinoni
Alessandro Marinoni
Professor, University of California San Diego
plasma physicsturbulenceopticsnumerical modelingsignal processing
A
Alexey Arefiev
University of California San Diego
Rose Yu
Rose Yu
Associate Professor, University of California, San Diego
Machine LearningComputational Sustainability