๐ค AI Summary
SLURM logs in HPC scientific workflows lack explicit case identifiers, hindering direct application of process mining. Method: This paper proposes an automatic job-correlation method based on implicit job dependency modelingโparsing SLURM logs and jointly leveraging spatiotemporal job feature matching and graph-structured modeling to achieve end-to-end clustering of unannotated jobs. Contribution/Results: We introduce the first systematic preprocessing framework for process mining on HPC logs, integrating algorithms such as Heuristics Miner to support process discovery and bottleneck diagnosis. Evaluated on real-world HPC cluster logs, our approach significantly improves workflow traceability, accurately identifies I/O- and scheduler-related performance bottlenecks, and enables high-fidelity reconstruction of end-to-end process models.
๐ Abstract
Computer-based scientific experiments are becoming increasingly data-intensive, necessitating the use of High-Performance Computing (HPC) clusters to handle large scientific workflows. These workflows result in complex data and control flows within the system, making analysis challenging. This paper focuses on the extraction of case IDs from SLURM-based HPC cluster logs, a crucial step for applying mainstream process mining techniques. The core contribution is the development of methods to correlate jobs in the system, whether their interdependencies are explicitly specified or not. We present our log extraction and correlation techniques, supported by experiments that validate our approach, enabling comprehensive documentation of workflows and identification of performance bottlenecks.