🤖 AI Summary
This work addresses key challenges in network intrusion detection—namely data scarcity, privacy sensitivity, and insufficient model robustness—by introducing a novel multimodal dataset that unifies traffic, payload, and temporal contextual features into a cohesive representation space. To enhance data availability while preserving privacy, the study proposes a synthetic data generation approach that integrates adversarial generative models with the Synthetic Data Vault (SDV) framework. The fidelity, utility, and privacy-preserving properties of the generated data are rigorously validated through f-divergence metrics, distinguishability tests, TRTS/TSTR evaluations, and non-parametric statistical analyses. Experimental results demonstrate that the proposed method significantly improves the accuracy and generalization capability of intrusion detection models, thereby establishing a high-quality, reproducible foundation for cybersecurity research and evaluation.
📝 Abstract
Supervised detection of network attacks has always been a critical part of network intrusion detection systems (NIDS). Nowadays, in a pivotal time for artificial intelligence (AI), with even more sophisticated attacks that utilize advanced techniques, such as generative artificial intelligence (GenAI) and reinforcement learning, it has become a vital component if we wish to protect our personal data, which are scattered across the web. In this paper, we address two tasks, in the first unified multi-modal NIDS dataset, which incorporates flow-level data, packet payload information and temporal contextual features, from the reprocessed CIC-IDS-2017, CIC-IoT-2023, UNSW-NB15 and CIC-DDoS-2019, with the same feature space. In the first task we use machine learning (ML) algorithms, with stratified cross validation, in order to prevent network attacks, with stability and reliability. In the second task we use adversarial learning algorithms to generate synthetic data, compare them with the real ones and evaluate their fidelity, utility and privacy using the SDV framework, f-divergences, distinguishability and non-parametric statistical tests. The findings provide stable ML models for intrusion detection and generative models with high fidelity and utility, by combining the Synthetic Data Vault framework, the TRTS and TSTR tests, with non-parametric statistical tests and f-divergence measures.