Binary and Multiclass Cyberattack Classification on GeNIS Dataset

📅 2025-11-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the weak generalization of AI models in network intrusion detection systems (NIDS) caused by low-quality, insufficiently diverse, and outdated training data, this study systematically evaluates the reliability of the GeNIS dataset for both binary and multiclass network attack detection. We propose a collaborative dimensionality reduction strategy integrating five feature selection methods—information gain, chi-square test, recursive feature elimination (RFE), mean absolute deviation, and dispersion ratio—to efficiently extract temporal and quantitative behavioral features. Leveraging this benchmark, we develop three decision tree ensemble models and two deep neural networks; all achieve high accuracy and F1-scores, with decision tree ensembles demonstrating marginally superior generalization and significantly higher computational efficiency. This work fills a critical gap by providing the first high-quality, reproducible, AI-NIDS-oriented public dataset and establishes an empirical benchmark for feature engineering and model selection in NIDS research.

Technology Category

Application Category

📝 Abstract
The integration of Artificial Intelligence (AI) in Network Intrusion Detection Systems (NIDS) is a promising approach to tackle the increasing sophistication of cyberattacks. However, since Machine Learning (ML) and Deep Learning (DL) models rely heavily on the quality of their training data, the lack of diverse and up-to-date datasets hinders their generalization capability to detect malicious activity in previously unseen network traffic. This study presents an experimental validation of the reliability of the GeNIS dataset for AI-based NIDS, to serve as a baseline for future benchmarks. Five feature selection methods, Information Gain, Chi-Squared Test, Recursive Feature Elimination, Mean Absolute Deviation, and Dispersion Ratio, were combined to identify the most relevant features of GeNIS and reduce its dimensionality, enabling a more computationally efficient detection. Three decision tree ensembles and two deep neural networks were trained for both binary and multiclass classification tasks. All models reached high accuracy and F1-scores, and the ML ensembles achieved slightly better generalization while remaining more efficient than DL models. Overall, the obtained results indicate that the GeNIS dataset supports intelligent intrusion detection and cyberattack classification with time-based and quantity-based behavioral features.
Problem

Research questions and friction points this paper is trying to address.

Evaluating GeNIS dataset reliability for AI-based intrusion detection
Identifying optimal feature selection methods to reduce dimensionality
Comparing ML and DL models for cyberattack classification performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combining five feature selection methods for dimensionality reduction
Training decision tree ensembles and deep neural networks
Using time-based and quantity-based behavioral features
🔎 Similar Papers
No similar papers found.
M
Miguel Silva
GECAD, ISEP, Polytechnic of Porto, rua Dr. António Bernardino de Almeida, 4249-015 Porto, Portugal
Daniela Pinto
Daniela Pinto
GECAD, ISEP, Polytechnic of Porto, rua Dr. António Bernardino de Almeida, 4249-015 Porto, Portugal
J
João Vitorino
GECAD, ISEP, Polytechnic of Porto, rua Dr. António Bernardino de Almeida, 4249-015 Porto, Portugal
Eva Maia
Eva Maia
GECAD-ISEP
CyberSecurityArtificial InteligenceMachine LearningIndustry 4.0Encryption
Isabel Praça
Isabel Praça
Professor, ISEP
Ivone Amorim
Ivone Amorim
GECAD-ISEP and DCC-FCUP
CybersecurityCryptographyBlockchain
M
M. J. Viamonte
GECAD, ISEP, Polytechnic of Porto, rua Dr. António Bernardino de Almeida, 4249-015 Porto, Portugal