🤖 AI Summary
This work addresses the challenges of applying large language models (LLMs) in modeling and simulation (M&S), where suboptimal prompt design, improper hyperparameter configuration, or inadequate data handling often lead to performance degradation, information loss, and non-deterministic behavior. For the first time, this study systematically identifies latent pitfalls specific to LLM deployment in M&S and proposes a principled framework centered on rigorous design and empirical evaluation. The framework encompasses key techniques including prompt engineering, retrieval-augmented generation (RAG), low-rank adaptation (LoRA), temperature control, and context management. By offering a structured set of practical guidelines, this research enables practitioners to critically assess the suitability and implementation strategies of LLMs in M&S contexts, thereby substantially enhancing their effectiveness and reliability.
📝 Abstract
Large language models (LLMs) have rapidly become familiar tools to researchers and practitioners. Concepts such as prompting, temperature, or few-shot examples are now widely recognized, and LLMs are increasingly used in Modeling&Simulation (M&S) workflows. However, practices that appear straightforward may introduce subtle issues, unnecessary complexity, or may even lead to inferior results. Adding more data can backfire (e.g., deteriorating performance through model collapse or inadvertently wiping out existing guardrails), spending time on fine-tuning a model can be unnecessary without a prior assessment of what it already knows, setting the temperature to 0 is not sufficient to make LLMs deterministic, providing a large volume of M&S data as input can be excessive (LLMs cannot attend to everything) but naive simplifications can lose information. We aim to provide comprehensive and practical guidance on how to use LLMs, with an emphasis on M&S applications. We discuss common sources of confusion, including non-determinism, knowledge augmentation (including RAG and LoRA), decomposition of M&S data, and hyper-parameter settings. We emphasize principled design choices, diagnostic strategies, and empirical evaluation, with the goal of helping modelers make informed decisions about when, how, and whether to rely on LLMs.