🤖 AI Summary
This work proposes an end-to-end, reproducible supervised traffic flow classification framework that addresses the limitations of traditional port- or payload-based methods in the face of encrypted and increasingly diverse network traffic. The framework integrates practical considerations from real-world measurements, incorporating flow-based feature extraction, time-aware data splitting, leakage-proof experimental design, and interpretability analysis to mitigate common methodological pitfalls. Accompanied by an open-source Jupyter Notebook implementation, it provides a complete pipeline—from traffic capture and dataset construction to model training, evaluation, and deployment. Empirical validation on real-world encrypted traffic demonstrates the approach’s effectiveness, robustness, and practical deployability.
📝 Abstract
Modern networks carry increasingly diverse and encrypted traffic types that demand classification techniques beyond traditional port-based and payload-based methods. This tutorial provides a practical, end-to-end guide to building machine-learning-based network traffic flow classification systems. We cover the workflow from flow metering and dataset creation, through ground-truth labeling and feature engineering, to leakage-resistant experimental design, model training and evaluation, explainability, and deployment considerations. The tutorial focuses on supervised flow-based classification that remains effective under encryption and provides actionable guidance on algorithm selection, performance metrics, and realistic partitioning strategies, with emphasis on common real-world measurement artifacts and methodological pitfalls. A companion set of five Jupyter notebooks on GitHub implements the data-to-model workflow on real traffic captures, enabling readers to reproduce key steps. The intended audience includes researchers and practitioners with foundational networking knowledge who aim to design and deploy robust traffic classification systems in operational environments.