🤖 AI Summary
Existing process mining evaluation faces challenges including the scarcity of real-world logs with ground-truth models, noisy event logs, and a disconnect between behavioral deviation modeling and log generation. This paper proposes the first traceable synthetic benchmark framework jointly linking models, logs, and deviations: taking an initial Petri net or BPMN process model as input, it integrates a library of behavioral deviation patterns (e.g., skipping, redoing, reordering) and event-level log perturbation mechanisms to generate imperfect logs with controllable deviations. Unlike conventional approaches injecting noise only at the log level, our framework enables fine-grained, joint modeling of both behavioral deviations and recording errors. Evaluation is systematically performed via relaxed alignments. We construct three synthetic datasets; one has been successfully deployed in conformance checking evaluation, quantitatively exposing algorithmic sensitivity differences and qualitatively characterizing their explanatory boundaries across distinct deviation types.
📝 Abstract
The assessment of process mining techniques using real-life data is often compromised by the lack of ground truth knowledge, the presence of non-essential outliers in system behavior and recording errors in event logs. Using synthetically generated data could leverage ground truth for better evaluation. Existing log generation tools inject noise directly into the logs, which does not capture many typical behavioral deviations. Furthermore, the link between the model and the log, which is needed for later assessment, becomes lost. We propose a ground-truth approach for generating process data from either existing or synthetic initial process models, whether automatically generated or hand-made. This approach incorporates patterns of behavioral deviations and recording errors to produce a synthetic yet realistic deviating model and imperfect event log. These, together with the initial model, are required to assess process mining techniques based on ground truth knowledge. We demonstrate this approach to create datasets of synthetic process data for three processes, one of which we used in a conformance checking use case, focusing on the assessment of (relaxed) systemic alignments to expose and explain deviations in modeled and recorded behavior. Our results show that this approach, unlike traditional methods, provides detailed insights into the strengths and weaknesses of process mining techniques, both quantitatively and qualitatively.