Tutorial on Flow-Based Network Traffic Classification Using Machine Learning

📅 2026-01-07
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes an end-to-end, reproducible supervised traffic flow classification framework that addresses the limitations of traditional port- or payload-based methods in the face of encrypted and increasingly diverse network traffic. The framework integrates practical considerations from real-world measurements, incorporating flow-based feature extraction, time-aware data splitting, leakage-proof experimental design, and interpretability analysis to mitigate common methodological pitfalls. Accompanied by an open-source Jupyter Notebook implementation, it provides a complete pipeline—from traffic capture and dataset construction to model training, evaluation, and deployment. Empirical validation on real-world encrypted traffic demonstrates the approach’s effectiveness, robustness, and practical deployability.

Technology Category

Application Category

📝 Abstract
Modern networks carry increasingly diverse and encrypted traffic types that demand classification techniques beyond traditional port-based and payload-based methods. This tutorial provides a practical, end-to-end guide to building machine-learning-based network traffic flow classification systems. We cover the workflow from flow metering and dataset creation, through ground-truth labeling and feature engineering, to leakage-resistant experimental design, model training and evaluation, explainability, and deployment considerations. The tutorial focuses on supervised flow-based classification that remains effective under encryption and provides actionable guidance on algorithm selection, performance metrics, and realistic partitioning strategies, with emphasis on common real-world measurement artifacts and methodological pitfalls. A companion set of five Jupyter notebooks on GitHub implements the data-to-model workflow on real traffic captures, enabling readers to reproduce key steps. The intended audience includes researchers and practitioners with foundational networking knowledge who aim to design and deploy robust traffic classification systems in operational environments.
Problem

Research questions and friction points this paper is trying to address.

network traffic classification
encrypted traffic
flow-based classification
machine learning
traffic analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow-based classification
machine learning
encrypted traffic
leakage-resistant design
reproducible workflow
🔎 Similar Papers
No similar papers found.
A
Adrián Pekár
Budapest University of Technology and Economics, Hungary; CUJO LLC, Hungary
R
Richard Plný
Faculty of Information Technology, Czech Technical University in Prague
Karel Hynek
Karel Hynek
FIT CTU & CESNET a.l.e.
Network securityPrivacyMachine Learning