Universal Embedding Function for Traffic Classification via QUIC Domain Recognition Pretraining: A Transfer Learning Success

📅 2025-02-18

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses key challenges in encrypted traffic classification—rapid protocol evolution, scarce labeled data, and poor model generalization—by proposing a general-purpose embedding learning paradigm for QUIC encrypted traffic. The core method leverages the Server Name Indication (SNI) field, extractable from QUIC handshakes, as a weak supervisory signal to pretrain a deep embedding model; ArcFace loss and disjoint-class design are incorporated to enhance discriminability and cross-protocol generalization. Subsequently, the pretrained embeddings are fine-tuned via transfer learning for multi-source traffic classification tasks. The key contribution is the first cross-protocol universal embedding framework tailored for encrypted traffic, overcoming the limitations of task-specific modeling. Experiments demonstrate state-of-the-art performance on 4 out of 5 mainstream benchmark datasets. The code, models, and pretrained weights are publicly released.

Technology Category

Application Category

📝 Abstract

Encrypted traffic classification (TC) methods must adapt to new protocols and extensions as well as to advancements in other machine learning fields. In this paper, we follow a transfer learning setup best known from computer vision. We first pretrain an embedding model on a complex task with a large number of classes and then transfer it to five well-known TC datasets. The pretraining task is recognition of SNI domains in encrypted QUIC traffic, which in itself is a problem for network monitoring due to the growing adoption of TLS Encrypted Client Hello. Our training pipeline -- featuring a disjoint class setup, ArcFace loss function, and a modern deep learning architecture -- aims to produce universal embeddings applicable across tasks. The proposed solution, based on nearest neighbors search in the embedding space, surpasses SOTA performance on four of the five TC datasets. A comparison with a baseline method utilizing raw packet sequences revealed unexpected findings with potential implications for the broader TC field. We published the model architecture, trained weights, and transfer learning experiments.

Problem

Research questions and friction points this paper is trying to address.

Adapt TC methods to new protocols and extensions

Pretrain embedding model on QUIC traffic recognition

Achieve SOTA performance on TC datasets

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transfer learning setup from computer vision

Pretraining on QUIC domain recognition

Nearest neighbors search in embedding space

🔎 Similar Papers

No similar papers found.

Authors to Follow