🤖 AI Summary
This work addresses the opacity of existing large language model (LLM)-driven simulation-based decision systems, which treat scientific simulators as black boxes and lack explicit reasoning about their underlying mechanisms and assumptions. To overcome this limitation, the authors propose MechSim, a novel framework that introduces mechanism-level reasoning into the interaction between LLMs and scientific simulators. MechSim employs structured mechanistic representations to model a simulator’s assumptions, variable dependencies, and execution traces, integrating neural-symbolic reasoning with a constraint engine to enable LLMs to perform explainable, traceable, and constraint-aware inference. Experiments across multiple high-stakes domains demonstrate that MechSim significantly enhances the quality of mechanistic explanations, deepens simulation analysis, and improves the reliability of downstream decisions, thereby transcending the traditional limitation of neural-symbolic systems that operate only on static symbolic representations.
📝 Abstract
Scientific simulators are increasingly being integrated into LLM-driven systems for high-stakes simulation-driven decision-making. However, existing frameworks primarily use LLMs to generate, calibrate, or execute simulators, treating them as black-box interfaces rather than as structured mechanistic systems that can be reasoned about. As a result, current approaches lack the ability to identify, represent, and reason about the assumptions and mechanisms underlying simulator behavior, limiting transparency, auditability, and decision justification. We introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework for executable scientific simulators. Unlike prior neuro-symbolic approaches that primarily reason over static symbolic structures, MechSim enables LLM agents to reason about the mechanisms, assumptions, and execution behavior of scientific simulators. Our framework represents simulators through a shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces. On top of this representation, LLM agents operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms. We evaluate our approach across multiple high-stakes domains and show that it improves mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.