AutoFlow: An Autoencoder-based Approach for IP Flow Record Compression with Minimal Impact on Traffic Classification

📅 2024-09-17

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

197K/year

🤖 AI Summary

To address the challenges of massive storage and analysis of IP flow records in network monitoring, this paper proposes a lossy compression method based on deep autoencoders that enables direct protocol and encrypted traffic classification within the compressed domain—without decompressing or reconstructing the original flows. The method employs end-to-end supervised training to jointly optimize compression ratio and downstream classification performance, leveraging flow feature embedding to preserve discriminative information critical for classification. Evaluated on real-world traffic datasets, it achieves a 1.313× compression ratio while maintaining a classification accuracy of 99.27%—only 0.5 percentage points lower than that on raw data—thereby significantly improving storage and processing efficiency. This work is the first to empirically validate the feasibility of high-fidelity traffic identification directly in the compressed domain, striking a balance among compression efficiency, reconstruction-agnostic operation, and analytical utility. It establishes a novel “compress-and-analyze” paradigm for flow data.

Technology Category

Application Category

📝 Abstract

Network monitoring generates massive volumes of IP flow records, posing significant challenges for storage and analysis. This paper presents a novel deep learning-based approach to compressing these records using autoencoders, enabling direct analysis of compressed data without requiring decompression. Unlike traditional compression methods, our approach reduces data volume while retaining the utility of compressed data for downstream analysis tasks, including distinguishing modern application protocols and encrypted traffic from popular services. Through extensive experiments on a real-world network traffic dataset, we demonstrate that our autoencoder-based compression achieves a 1.313x reduction in data size while maintaining 99.27% accuracy in a multi-class traffic classification task, compared to 99.77% accuracy with uncompressed data. This marginal decrease in performance is offset by substantial gains in storage and processing efficiency. The implications of this work extend to more efficient network monitoring and scalable, real-time network management solutions.

Problem

Research questions and friction points this paper is trying to address.

IP Traffic Compression

Network Monitoring

Data Analysis Efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

AutoFlow

Deep Learning Compression

Network Traffic Analysis

🔎 Similar Papers

No similar papers found.