๐ค AI Summary
This work addresses the challenges in reproducibility, unfair comparison, and high reimplementation costs plaguing existing provenance graphโbased intrusion detection systems (PIDS), which stem from the absence of a unified evaluation framework. To this end, we propose PIDSMaker, an open-source framework that enables standardized evaluation of PIDS for the first time. PIDSMaker integrates eight state-of-the-art methods under a consistent pipeline featuring uniform data preprocessing, labeling conventions, and evaluation protocols. Its modular architecture, driven by YAML configuration files, facilitates code-free component composition, rapid prototyping, and ablation studies. The framework is further augmented with visualization tools and publicly released preprocessed datasets. Collectively, these contributions significantly enhance the reproducibility, evaluation efficiency, and fairness of PIDS research.
๐ Abstract
Recent provenance-based intrusion detection systems (PIDSs) have demonstrated strong potential for detecting advanced persistent threats (APTs) by applying machine learning to system provenance graphs. However, evaluating and comparing PIDSs remains difficult: prior work uses inconsistent preprocessing pipelines, non-standard dataset splits, and incompatible ground-truth labeling and metrics. These discrepancies undermine reproducibility, impede fair comparison, and impose substantial re-implementation overhead on researchers. We present PIDSMaker, an open-source framework for developing and evaluating PIDSs under consistent protocols. PIDSMaker consolidates eight state-of-the-art systems into a modular, extensible architecture with standardized preprocessing and ground-truth labels, enabling consistent experiments and apples-to-apples comparisons. A YAML-based configuration interface supports rapid prototyping by composing components across systems without code changes. PIDSMaker also includes utilities for ablation studies, hyperparameter tuning, multi-run instability measurement, and visualization, addressing methodological gaps identified in prior work. We demonstrate PIDSMaker through concrete use cases and release it with preprocessed datasets and labels to support shared evaluation for the PIDS community.