🤖 AI Summary
This paper addresses the challenge of detecting rework anomalies in business processes. It presents the first systematic investigation of GPT-4o-2024-08-06’s capability to identify rework in acyclic synthetic event logs. We propose a prompt-engineering-based structured log parsing method that transforms unstructured event logs into reasoning-friendly formats, and design zero-shot, one-shot, and few-shot prompting strategies tailored to varying rework distributions—normal, uniform, and exponential. Experimental results show that one-shot prompting achieves 96.14% accuracy under normal distribution, while few-shot prompting attains 97.94% under uniform distribution. We uncover systematic performance correlations between prompting paradigms and anomaly distribution types. This work validates large language models (LLMs) as lightweight, training-free tools for process anomaly detection, demonstrating strong generalization across distributional settings. It establishes a novel LLM-driven paradigm for process mining, bridging foundation models and operational process analytics.
📝 Abstract
This paper investigates the effectiveness of GPT-4o-2024-08-06, one of the Large Language Models (LLM) from OpenAI, in detecting business process anomalies, with a focus on rework anomalies. In our study, we developed a GPT-4o-based tool capable of transforming event logs into a structured format and identifying reworked activities within business event logs. The analysis was performed on a synthetic dataset designed to contain rework anomalies but free of loops. To evaluate the anomaly detection capabilities of GPT 4o-2024-08-06, we used three prompting techniques: zero-shot, one-shot, and few-shot. These techniques were tested on different anomaly distributions, namely normal, uniform, and exponential, to identify the most effective approach for each case. The results demonstrate the strong performance of GPT-4o-2024-08-06. On our dataset, the model achieved 96.14% accuracy with one-shot prompting for the normal distribution, 97.94% accuracy with few-shot prompting for the uniform distribution, and 74.21% accuracy with few-shot prompting for the exponential distribution. These results highlight the model's potential as a reliable tool for detecting rework anomalies in event logs and how anomaly distribution and prompting strategy influence the model's performance.