A Ground Truth Approach for Assessing Process Mining Techniques

📅 2025-01-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing process mining evaluation faces challenges including the scarcity of real-world logs with ground-truth models, noisy event logs, and a disconnect between behavioral deviation modeling and log generation. This paper proposes the first traceable synthetic benchmark framework jointly linking models, logs, and deviations: taking an initial Petri net or BPMN process model as input, it integrates a library of behavioral deviation patterns (e.g., skipping, redoing, reordering) and event-level log perturbation mechanisms to generate imperfect logs with controllable deviations. Unlike conventional approaches injecting noise only at the log level, our framework enables fine-grained, joint modeling of both behavioral deviations and recording errors. Evaluation is systematically performed via relaxed alignments. We construct three synthetic datasets; one has been successfully deployed in conformance checking evaluation, quantitatively exposing algorithmic sensitivity differences and qualitatively characterizing their explanatory boundaries across distinct deviation types.

Technology Category

Application Category

📝 Abstract
The assessment of process mining techniques using real-life data is often compromised by the lack of ground truth knowledge, the presence of non-essential outliers in system behavior and recording errors in event logs. Using synthetically generated data could leverage ground truth for better evaluation. Existing log generation tools inject noise directly into the logs, which does not capture many typical behavioral deviations. Furthermore, the link between the model and the log, which is needed for later assessment, becomes lost. We propose a ground-truth approach for generating process data from either existing or synthetic initial process models, whether automatically generated or hand-made. This approach incorporates patterns of behavioral deviations and recording errors to produce a synthetic yet realistic deviating model and imperfect event log. These, together with the initial model, are required to assess process mining techniques based on ground truth knowledge. We demonstrate this approach to create datasets of synthetic process data for three processes, one of which we used in a conformance checking use case, focusing on the assessment of (relaxed) systemic alignments to expose and explain deviations in modeled and recorded behavior. Our results show that this approach, unlike traditional methods, provides detailed insights into the strengths and weaknesses of process mining techniques, both quantitatively and qualitatively.
Problem

Research questions and friction points this paper is trying to address.

Workflow Analysis
Data Simulation
Accuracy Improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Data Generation Method
Workflow Analysis
Bias and Error Simulation
🔎 Similar Papers
No similar papers found.
D
Dominique Sommers
Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, the Netherlands
Natalia Sidorova
Natalia Sidorova
Technische Universiteit Eindhoven
Computer Science
B
B. V. Dongen
Mathematics and Computer Science, Eindhoven University of Technology, Eindhoven, the Netherlands