🤖 AI Summary
Traditional static analysis and signature-based detection struggle to combat metamorphic rootkits. This work proposes SeqShield, a dynamic behavioral analysis approach tailored for Windows systems that identifies rootkits by monitoring runtime API call sequences, thereby eliminating reliance on static features. SeqShield leverages n-gram (bigram and trigram) feature extraction, employs the Gini impurity index for feature selection, and constructs a random forest classifier. To evaluate robustness against advanced obfuscation techniques, the method utilizes a metamorphic code engine to generate diverse variant samples. Experimental results demonstrate that, using an optimized low-dimensional feature set, SeqShield achieves detection accuracies of 96.72% with bigrams and 97.81% with trigrams, effectively balancing precision and computational efficiency.
📝 Abstract
Rootkits are among the most elusive types of malware, capable of bypassing traditional static analysis methods due to their metamorphic behavior. Signature-based detection techniques struggle against these threats, necessitating a shift toward dynamic analysis approaches. We propose SeqShield, a behavior-based rootkit detection approach designed specifically for the Windows OS, leveraging API call sequences for dynamic behavior analysis. Instead of relying on static signatures, SeqShield examines the execution patterns of API calls, which inherently reflect malicious intent. Analyzing API sequences, we can effectively identify rootkit-like behavior. We also employed a metamorphic code engine to generate 10X mutated variants of rootkits, demonstrating their obfuscation strategies. SeqShield applies n-gram analysis to extract bigram and trigram features from these API call sequences, enabling effective detection of rootkit-like activity. Among the models tested, Random Forest achieves the highest accuracy of 97.27% (bigram) and 96.17% (trigram). To optimize performance and decrease the dimension, we apply feature importance ranking using the Gini Impurity Index, iteratively selecting the most significant features. The optimized lower-dimensional feature matrix significantly enhances detection efficiency without sacrificing accuracy. Using the optimized feature set, our approach achieves 96.72% accuracy for bigrams and 97.81% accuracy for trigrams.