Applying Process Mining on Scientific Workflows: a Case Study

📅 2023-07-06

🏛️ arXiv.org

📈 Citations: 3

✨ Influential: 0

career value

238K/year

🤖 AI Summary

SLURM logs in HPC scientific workflows lack explicit case identifiers, hindering direct application of process mining. Method: This paper proposes an automatic job-correlation method based on implicit job dependency modeling—parsing SLURM logs and jointly leveraging spatiotemporal job feature matching and graph-structured modeling to achieve end-to-end clustering of unannotated jobs. Contribution/Results: We introduce the first systematic preprocessing framework for process mining on HPC logs, integrating algorithms such as Heuristics Miner to support process discovery and bottleneck diagnosis. Evaluated on real-world HPC cluster logs, our approach significantly improves workflow traceability, accurately identifies I/O- and scheduler-related performance bottlenecks, and enables high-fidelity reconstruction of end-to-end process models.

📝 Abstract

Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.

Problem

Research questions and friction points this paper is trying to address.

Extract case IDs from SLURM-based HPC logs.

Correlate jobs with explicit or implicit dependencies.

Document workflows and identify performance bottlenecks.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Process Mining on HPC

SLURM log case ID extraction

Job correlation techniques development

🔎 Similar Papers

No similar papers found.