🤖 AI Summary
To address the out-of-distribution (OOD) identification challenge in encrypted traffic classification—particularly for emerging applications—we propose a two-stage adaptive framework. In Stage I, we jointly model inter-layer transformation smoothness in Transformers and perform feature analysis to achieve high-accuracy in-distribution (ID)/OOD discrimination. In Stage II, we leverage large language models (LLMs) with a semantic-enhanced generative prompting mechanism, reformulating OOD identification as a fine-grained, label-free generative classification task. This work establishes the first ID/OOD co-identification paradigm, breaking the conventional closed-set assumption. Evaluated on three benchmark datasets, our method achieves macro-averaged accuracy of 96.81–97.70% and macro-F1 of 96.77–97.68%, significantly outperforming state-of-the-art approaches—by up to 53 percentage points—and markedly improving recognition of emerging applications.
📝 Abstract
Encrypted traffic classification aims to identify applications or services by analyzing network traffic data. One of the critical challenges is the continuous emergence of new applications, which generates Out-of-Distribution (OOD) traffic patterns that deviate from known categories and are not well represented by predefined models. Current approaches rely on predefined categories, which limits their effectiveness in handling unknown traffic types. Although some methods mitigate this limitation by simply classifying unknown traffic into a single "Other" category, they fail to make a fine-grained classification. In this paper, we propose a Two-stage Adaptive OOD classification Network (TAO-Net) that achieves accurate classification for both In-Distribution (ID) and OOD encrypted traffic. The method incorporates an innovative two-stage design: the first stage employs a hybrid OOD detection mechanism that integrates transformer-based inter-layer transformation smoothness and feature analysis to effectively distinguish between ID and OOD traffic, while the second stage leverages large language models with a novel semantic-enhanced prompt strategy to transform OOD traffic classification into a generation task, enabling flexible fine-grained classification without relying on predefined labels. Experiments on three datasets demonstrate that TAO-Net achieves 96.81-97.70% macro-precision and 96.77-97.68% macro-F1, outperforming previous methods that only reach 44.73-86.30% macro-precision, particularly in identifying emerging network applications.