🤖 AI Summary
To address the scarcity of network attack data, which severely limits intrusion detection system (IDS) performance, this paper proposes a high-fidelity synthetic attack traffic generation framework. Methodologically, we design a dual-path VAE-GAN architecture integrated with progressive training and attack-semantic-aware domain feature matching, jointly preserving both statistical traffic characteristics and attack logic consistency. Evaluated on 100,000 real-world traffic samples, an IDS trained solely on our synthetic data achieves a 98% weighted accuracy on real attacks. Quantitative analysis demonstrates significant improvements over state-of-the-art methods in distribution fidelity, sample diversity, and attack semantic fidelity. The framework establishes a novel paradigm for robust, privacy-preserving IDS training in low-resource settings.
📝 Abstract
The scarcity of cyberattack data hinders the development of robust intrusion detection systems. This paper introduces PHANTOM, a novel adversarial variational framework for generating high-fidelity synthetic attack data. Its innovations include progressive training, a dual-path VAE-GAN architecture, and domain-specific feature matching to preserve the semantics of attacks. Evaluated on 100,000 network traffic samples, models trained on PHANTOM data achieve 98% weighted accuracy on real attacks. Statistical analyses confirm that the synthetic data preserves authentic distributions and diversity. Limitations in generating rare attack types are noted, highlighting challenges with severe class imbalance. This work advances the generation of synthetic data for training robust, privacy-preserving detection systems.