Can Drift-Adaptive Malware Detectors Be Made Robust? Attacks and Defenses Under White-Box and Black-Box Threats

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient robustness of malware detectors in scenarios where concept drift and adversarial attacks coexist. To tackle this challenge, the authors propose a general-purpose robustification framework that fine-tunes pretrained models on adversarially transformed inputs, thereby jointly defending against both white-box (e.g., PGD) and black-box (e.g., MalGuise) attacks. The study presents the first systematic investigation into the vulnerability of drift-adaptive detectors under such composite threats, revealing that optimal defense strategies vary significantly across attack models. By integrating adversarial domain adaptation with source/target domain fine-tuning, the approach enhances generalization capability. Experimental results on a Windows malware dataset demonstrate that the method reduces the success rate of PGD attacks from 100% to 3.2% and that of MalGuise attacks from 13% to 5.1%, while maintaining high accuracy, robustness, and low deployment overhead.
📝 Abstract
Concept drift and adversarial evasion are two major challenges for deploying machine learning-based malware detectors. While both have been studied separately, their combination, the adversarial robustness of drift-adaptive detectors, remains unexplored. We address this problem with AdvDA, a recent malware detector that uses adversarial domain adaptation to align a labeled source domain with a target domain with limited labels. The distribution shift between domains poses a unique challenge: robustness learned on the source may not transfer to the target, and existing defenses assume a fixed distribution. To address this, we propose a universal robustification framework that fine-tunes a pretrained AdvDA model on adversarially transformed inputs, agnostic to the attack type and choice of transformations. We instantiate it with five defense variants spanning two threat models: white-box PGD attacks in the feature space and black-box MalGuise attacks that modify malware binaries via functionality-preserving control-flow mutations. Across nine defense configurations, five monthly adaptation windows on Windows malware, and three false-positive-rate operating points, we find the undefended AdvDA completely vulnerable to PGD (100% attack success) and moderately to MalGuise (13%). Our framework reduces these rates to as low as 3.2% and 5.1%, respectively, but the optimal strategy differs: source adversarial training is essential for PGD defenses yet counterproductive for MalGuise defenses, where target-only training suffices. Furthermore, robustness does not transfer across these two threat models. We provide deployment recommendations that balance robustness, detection accuracy, and computational cost.
Problem

Research questions and friction points this paper is trying to address.

concept drift
adversarial evasion
malware detection
adversarial robustness
domain adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

adversarial robustness
concept drift
domain adaptation
malware detection
white-box and black-box attacks
🔎 Similar Papers
No similar papers found.