Statistical process discovery

📅 2025-04-30

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Scalability bottlenecks in stochastic process model discovery arise from reliance on exact symbolic computation, which incurs exponential complexity. Method: This paper proposes a novel framework integrating simulation-based Bayesian parameter inference with statistical model checking (SMC). It employs tunably precise stochastic language approximations—bypassing costly symbolic computation—to efficiently search the parameter space for stochastic process models that best reproduce the observed stochastic behavior in event logs. Contribution/Results: As the first approach enabling automated discovery of stochastic process models at scale, it demonstrates effectiveness on multiple real-world system event logs. Compared to non-simulation-based baselines, it achieves significantly improved discovery efficiency while offering controllable accuracy and formally verifiable reliability.

Technology Category

Application Category

📝 Abstract

Stochastic process discovery is concerned with deriving a model capable of reproducing the stochastic character of observed executions of a given process, stored in a log. This leads to an optimisation problem in which the model's parameter space is searched for, driven by the resemblance between the log's and the model's stochastic languages. The bottleneck of such optimisation problem lay in the determination of the model's stochastic language which existing approaches deal with through, hardly scalable, exact computation approaches. In this paper we introduce a novel framework in which we combine a simulation-based Bayesian parameter inference scheme, used to search for the ``optimal'' instance of a stochastic model, with an expressive statistical model checking engine, used (during inference) to approximate the language of the considered model's instance. Because of its simulation-based nature, the payoff is that, the runtime for discovering of the optimal instance of a model can be easily traded in for accuracy, hence allowing to treat large models which would result in a prohibitive runtime with non-simulation based alternatives. We validate our approach on several popular event logs concerning real-life systems.

Problem

Research questions and friction points this paper is trying to address.

Optimizing stochastic process models to match observed execution logs

Overcoming scalability issues in exact stochastic language computation

Combining Bayesian inference and statistical model checking for large models

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulation-based Bayesian parameter inference scheme

Expressive statistical model checking engine

Trade runtime for accuracy scalability

🔎 Similar Papers

Self-Supervised Iterative Refinement for Anomaly Detection in Industrial Quality Control

2024-08-21Proceedings of the 20th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and ApplicationsCitations: 1

💼 Related Jobs

Performance Engineer

Anthropic

$280,000—$850,000 USD

San Francisco, CA, USA

Software Engineer, Model Inference

OpenAI

$295K – $555K • Offers Equity

San Francisco

Staff Research Engineer, Discovery Team

Anthropic

$350,000—$850,000 USD