Improving the Identification of Real-world Malware's DNS Covert Channels Using Locality Sensitive Hashing

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

260K/year

🤖 AI Summary

Addressing the challenges of low accuracy and poor generalizability in DNS covert channel malware family identification, this paper proposes a subdomain sequence analysis method based on Locality-Sensitive Hashing (LSH). First, subdomain sequences from DNS queries are mapped to LSH fingerprints to capture their statistical similarity. Subsequently, robust sequential features are extracted and fed into a Random Forest classifier for malware family classification and behavioral pattern recognition. To the best of our knowledge, this is the first work to apply LSH to DNS covert channel detection, significantly enhancing detection capability against previously unseen or obfuscated malware variants. Experimental results demonstrate that the proposed method achieves higher detection accuracy and lower false positive rates compared to state-of-the-art approaches, while exhibiting superior generalizability and robustness under domain shifts and query perturbations.

Technology Category

Application Category

📝 Abstract

Nowadays, malware increasingly uses DNS-based covert channels in order to evade detection and maintain stealthy communication with its command-and-control servers. While prior work has focused on detecting such activity, identifying specific malware families and their behaviors from captured network traffic remains challenging due to the variability of DNS. In this paper, we present the first application of Locality Sensitive Hashing to the detection and identification of real-world malware utilizing DNS covert channels. Our approach encodes DNS subdomain sequences into statistical similarity features that effectively capture anomalies indicative of malicious activity. Combined with a Random Forest classifier, our method achieves higher accuracy and reduced false positive rates than prior approaches, while demonstrating improved robustness and generalization to previously unseen or modified malware samples. We further demonstrate that our approach enables reliable classification of malware behavior (e.g., uploading or downloading of files), based solely on DNS subdomains.

Problem

Research questions and friction points this paper is trying to address.

Identifying malware families using DNS covert channels remains challenging

Detecting malicious DNS activity with improved accuracy and reduced false positives

Classifying malware behaviors based solely on DNS subdomain sequences

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Locality Sensitive Hashing for DNS analysis

Encodes subdomain sequences into similarity features

Combines with Random Forest for improved classification

🔎 Similar Papers

No similar papers found.