Binary and Multiclass Cyberattack Classification on GeNIS Dataset

📅 2025-11-11

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

To address the weak generalization of AI models in network intrusion detection systems (NIDS) caused by low-quality, insufficiently diverse, and outdated training data, this study systematically evaluates the reliability of the GeNIS dataset for both binary and multiclass network attack detection. We propose a collaborative dimensionality reduction strategy integrating five feature selection methods—information gain, chi-square test, recursive feature elimination (RFE), mean absolute deviation, and dispersion ratio—to efficiently extract temporal and quantitative behavioral features. Leveraging this benchmark, we develop three decision tree ensemble models and two deep neural networks; all achieve high accuracy and F1-scores, with decision tree ensembles demonstrating marginally superior generalization and significantly higher computational efficiency. This work fills a critical gap by providing the first high-quality, reproducible, AI-NIDS-oriented public dataset and establishes an empirical benchmark for feature engineering and model selection in NIDS research.

Technology Category

Application Category

📝 Abstract

The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.

Problem

Research questions and friction points this paper is trying to address.

Evaluating GeNIS dataset reliability for AI-based intrusion detection

Identifying optimal feature selection methods to reduce dimensionality

Comparing ML and DL models for cyberattack classification performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining five feature selection methods for dimensionality reduction

Training decision tree ensembles and deep neural networks

Using time-based and quantity-based behavioral features

🔎 Similar Papers

Sequential Binary Classification for Intrusion Detection in Software Defined Networks