🤖 AI Summary
This work addresses the real-time bottleneck of flow correlation attacks against early network traffic (e.g., Tor anonymized communications). We propose an efficient flow correlation method tailored to extremely early packet sequences—specifically, the first 3–5 packets. Our approach introduces a novel multi-perspective triplet network architecture, yielding two models: Early-MFC and its lightweight variant, Early-MFC+. By jointly modeling transport-layer payload content and inter-packet timing delays, we integrate metric learning, contrastive learning, and Bayesian decision theory to achieve robust flow representation within a shared embedding space. Evaluated on real-world Tor traffic, our method achieves 92.3% correlation accuracy using only the first five packets—outperforming state-of-the-art methods by 3.8× in inference speed and reducing false positive rate by 67%. To our knowledge, this is the first flow correlation solution meeting stringent millisecond-level latency constraints (e.g., financial fraud detection) while maintaining high accuracy and low computational overhead.
📝 Abstract
Flow correlation attacks is an efficient network attacks, aiming to expose those who use anonymous network services, such as Tor. Conducting such attacks during the early stages of network communication is particularly critical for scenarios demanding rapid decision-making, such as cybercrime detection or financial fraud prevention. Although recent studies have made progress in flow correlation attacks techniques, research specifically addressing flow correlation with early network traffic flow remains limited. Moreover, due to factors such as model complexity, training costs, and real-time requirements, existing technologies cannot be directly applied to flow correlation with early network traffic flow. In this paper, we propose flow correlation attack with early network traffic, named Early-MFC, based on multi-view triplet networks. The proposed approach extracts multi-view traffic features from the payload at the transport layer and the Inter-Packet Delay. It then integrates multi-view flow information, converting the extracted features into shared embeddings. By leveraging techniques such as metric learning and contrastive learning, the method optimizes the embeddings space by ensuring that similar flows are mapped closer together while dissimilar flows are positioned farther apart. Finally, Bayesian decision theory is applied to determine flow correlation, enabling high-accuracy flow correlation with early network traffic flow. Furthermore, we investigate flow correlation attacks under extra-early network traffic flow conditions. To address this challenge, we propose Early-MFC+, which utilizes payload data to construct embedded feature representations, ensuring robust performance even with minimal packet availability.