๐ค AI Summary
Addressing the challenges of low accuracy and poor generalizability in DNS covert channel malware family identification, this paper proposes a subdomain sequence analysis method based on Locality-Sensitive Hashing (LSH). First, subdomain sequences from DNS queries are mapped to LSH fingerprints to capture their statistical similarity. Subsequently, robust sequential features are extracted and fed into a Random Forest classifier for malware family classification and behavioral pattern recognition. To the best of our knowledge, this is the first work to apply LSH to DNS covert channel detection, significantly enhancing detection capability against previously unseen or obfuscated malware variants. Experimental results demonstrate that the proposed method achieves higher detection accuracy and lower false positive rates compared to state-of-the-art approaches, while exhibiting superior generalizability and robustness under domain shifts and query perturbations.
๐ Abstract
Nowadays, malware increasingly uses DNS-based covert channels in order to evade detection and maintain stealthy communication with its command-and-control servers. While prior work has focused on detecting such activity, identifying specific malware families and their behaviors from captured network traffic remains challenging due to the variability of DNS. In this paper, we present the first application of Locality Sensitive Hashing to the detection and identification of real-world malware utilizing DNS covert channels. Our approach encodes DNS subdomain sequences into statistical similarity features that effectively capture anomalies indicative of malicious activity. Combined with a Random Forest classifier, our method achieves higher accuracy and reduced false positive rates than prior approaches, while demonstrating improved robustness and generalization to previously unseen or modified malware samples. We further demonstrate that our approach enables reliable classification of malware behavior (e.g., uploading or downloading of files), based solely on DNS subdomains.