Applying Process Mining on Scientific Workflows: a Case Study

๐Ÿ“… 2023-07-06
๐Ÿ›๏ธ arXiv.org
๐Ÿ“ˆ Citations: 3
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
SLURM logs in HPC scientific workflows lack explicit case identifiers, hindering direct application of process mining. Method: This paper proposes an automatic job-correlation method based on implicit job dependency modelingโ€”parsing SLURM logs and jointly leveraging spatiotemporal job feature matching and graph-structured modeling to achieve end-to-end clustering of unannotated jobs. Contribution/Results: We introduce the first systematic preprocessing framework for process mining on HPC logs, integrating algorithms such as Heuristics Miner to support process discovery and bottleneck diagnosis. Evaluated on real-world HPC cluster logs, our approach significantly improves workflow traceability, accurately identifies I/O- and scheduler-related performance bottlenecks, and enables high-fidelity reconstruction of end-to-end process models.
๐Ÿ“ Abstract
Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.
Problem

Research questions and friction points this paper is trying to address.

Extract case IDs from SLURM-based HPC logs.
Correlate jobs with explicit or implicit dependencies.
Document workflows and identify performance bottlenecks.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Mining on HPC
SLURM log case ID extraction
Job correlation techniques development
๐Ÿ”Ž Similar Papers
No similar papers found.
Z
Zahra Sadeghibogar
Chair of Process and Data Science, RWTH Aachen University, Aachen, Germany
A
A. Berti
Chair of Process and Data Science, RWTH Aachen University, Aachen, Germany
Marco Pegoraro
Marco Pegoraro
PhD student at Sapienza University of Rome
Deep learningGeometry ProcessingStructural Biology
W
Wil M.P. van der Aalst
Chair of Process and Data Science, RWTH Aachen University, Aachen, Germany