🤖 AI Summary
This study formalizes, for the first time, the problem of machine learning configuration selection under multiple operational constraints in malware detection and proposes a context-aware FDM decision framework. The framework maps five operational parameters—including platform constraints and resource budgets—onto nine configuration dimensions via a Weighted Configuration Compatibility Score (WCCS) to generate optimal configuration recommendations. Experimental results demonstrate that XGBoost achieves 97.46% accuracy on binary classification tasks with memory usage below 70 MB; integrating class-incremental learning incurs only a 0.65% accuracy drop; transfer learning yields an average 2.14× speedup; and autoencoder-based preprocessing accelerates training by 14× with merely a 0.86% accuracy loss. This work delivers a quantifiable and scalable configuration optimization solution for real-world deployment scenarios.
📝 Abstract
Selecting appropriate machine learning (ML) configurations for malware detection is a complex, multi-criteria problem. Model choice, feature engineering, and update mechanisms must jointly satisfy operational constraints that vary across deployment contexts. This paper proposes the Framework for Decision-making (FDM) to build ML-based malware detection systems. The FDM formalises this selection process using the Weighted Configuration Compatibility Score (WCCS), a multi-criteria scoring function mapping five operational parameters (platform constraint, resource budget, response latency, update frequency, and detection sensitivity) to ranked recommendations across nine configuration dimensions. To validate the framework, four experiments were conducted on three datasets (a private Windows API dataset, the public Malimg image benchmark, and an Android static API dataset). Key results include: (i) XGBoost achieved the best accuracy-to-resource ratio in binary classification (97.46 % test accuracy, <70 MB RAM), outperforming LSTM/BiLSTM which consumed up to 2.8 GB; (ii) in multi-class classification, classical models (XGBoost 79.03 %) outperformed recurrent deep models (BiLSTM 72.27 %), reversing the binary ranking; (iii) class-incremental learning with EfficientNetB0 maintained 99.13 % accuracy with only 0.65 pp degradation across 11 incremental steps; (iv) transfer learning reduced training time by 2.14 times on average for image-based malware data without significant accuracy cost; and (v) autoencoder pre-processing yielded a 14 times training speedup at a cost of only 0.86 pp accuracy. These findings confirm that the optimal ML configuration is context-dependent, validating the FDM's core premise and demonstrating its practical utility for cybersecurity practitioners.