A Hybrid Approach For Malware Classification Using Secondary Features Fusion

📅 2026-06-02

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

This study addresses the limitations of traditional malware detection methods in fine-grained family-level classification, which hinders precise threat response. To overcome this challenge, the authors propose a hybrid classification framework that integrates multi-source secondary features—including API call sequences and both fixed- and variable-length n-grams—combined with a tailored feature selection process and a voting-based ensemble learning strategy. Evaluated on a public Microsoft dataset, the proposed approach achieves an AUC of 0.989, an accuracy of 99.72%, and a log loss of 0.01, significantly outperforming current state-of-the-art methods. These results demonstrate the framework’s effectiveness and superiority in automated malware detection and family attribution.

📝 Abstract

The number of malware (either variant or novel) is rapidly increasing, making malware detection and mitigation a complex problem. One approach to improving malware mitigation is automatic detection and malware family classification. However, traditional malware detection methods cannot classify detected malware into their respective families, hindering effective malware mitigation. Consequently, this paper proposes a method to automate malware detection and classification of the detected malware into respective malware families. The proposed method uses feature fusion after extracting relevant malware features such as API calls and fixed and variable length n-grams with a customized feature selection method. Moreover, for the predictive model, a voting based approach is proposed for algorithm fusion. For the experimental evaluation of the proposed method, both binary and multi-class classification approaches are applied to the data set provided by Microsoft. Finally, the experimental results are compared with the state of the art. The experimental results indicate the effectiveness and efficiency of the proposed approach with an AUC of 0.989, accuracy of 99.72%, and a log loss of 0.01.

Problem

Research questions and friction points this paper is trying to address.

malware classification

malware detection

malware family

feature fusion

automatic classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

feature fusion

malware classification

voting-based ensemble