A Source Domain is All You Need: Source-Only Cross-OS Transfer Learning for APT Anomaly Detection via Semantic Alignment and Optimal Transport

📅 2026-06-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of unsupervised cross-operating-system detection of Advanced Persistent Threats (APTs) in the absence of labeled target-domain data by proposing a novel anomaly detection framework grounded in semantic alignment and optimal transport (OT). The approach integrates embeddings from pretrained language models, graph autoencoders, and semantic prototype comparisons to generate multi-channel anomaly evidence. It introduces an innovative OT variant incorporating entropy weighting, angle awareness, and density awareness, along with an OT barycenter-based anomaly scoring mechanism to effectively model behavioral uncertainty, directional drift, and sparse support. Evaluated on the DARPA dataset spanning Linux, Windows, BSD, and Android, the method significantly outperforms existing source-only baselines, achieving notable improvements in both ROC-AUC and nDCG metrics.
📝 Abstract
Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior. These challenges are amplified in cross-operating-system (cross-OS) settings, where a detector trained on one source platform must be deployed on an unlabeled target platform without access to target-domain labels. We study this source-only cross-OS APT detection problem using system-level provenance traces and propose a transport-based framework for ranking anomalous target processes under zero target supervision. The framework abstracts process behavior into structured natural-language descriptions, embeds them using pretrained language models, and constructs a source-normal reference for target scoring. It combines three evidence channels: semantic deviation from source-normal prototypes, structural deviation captured by graph autoencoding, and geometric deviation measured through Optimal Transport (OT). The main contribution is an OT-based barycentric anomaly score that projects target embeddings onto the source-normal manifold and quantifies residual transport mismatch. We further introduce entropy-weighted, angle-aware, and density-aware OT variants to capture uncertainty, directional drift, and sparse-support behavior. Evaluation on DARPA Transparent Computing data spanning Linux, Windows, BSD, and Android, across two APT scenarios and twelve cross-OS transfer pairs, shows that the proposed framework improves ROC-AUC and nDCG over source-only anomaly-detection baselines. The results demonstrate that source-only provenance modeling, combined with semantic abstraction and OT-based anomaly scoring, can support practical cross-platform APT detection without target-domain supervision.
Problem

Research questions and friction points this paper is trying to address.

cross-OS
APT detection
source-only
anomaly detection
transfer learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal Transport
Cross-OS Transfer Learning
Semantic Alignment
Provenance-based APT Detection
Source-Only Anomaly Scoring