🤖 AI Summary
To address key challenges in intrusion detection—including heterogeneous network traffic, evolving threat landscapes, and severe class imbalance—this paper proposes the first diffusion-based multimodal generative framework. The method innovatively integrates a Transformer with a CNN-based variational encoder to construct a unified implicit prior, enabling joint modeling of tabular features and image-like traffic representations; this facilitates cross-domain dependency capture and synergistic generation. An EDM-style denoiser is employed for end-to-end joint training. Evaluated on CIC-IDS-2017 and NSL-KDD, the framework significantly outperforms state-of-the-art baselines TabSyn and TabDDPM. It achieves superior sample fidelity and diversity, and yields substantial improvements in downstream intrusion detection performance—setting new state-of-the-art results across all metrics.
📝 Abstract
Modern Intrusion Detection Systems (IDS) face severe challenges due to heterogeneous network traffic, evolving cyber threats, and pronounced data imbalance between benign and attack flows. While generative models have shown promise in data augmentation, existing approaches are limited to single modalities and fail to capture cross-domain dependencies. This paper introduces MAGE-ID (Multimodal Attack Generator for Intrusion Detection), a diffusion-based generative framework that couples tabular flow features with their transformed images through a unified latent prior. By jointly training Transformer and CNN-based variational encoders with an EDM style denoiser, MAGE-ID achieves balanced and coherent multimodal synthesis. Evaluations on CIC-IDS-2017 and NSL-KDD demonstrate significant improvements in fidelity, diversity, and downstream detection performance over TabSyn and TabDDPM, highlighting the effectiveness of MAGE-ID for multimodal IDS augmentation.