🤖 AI Summary
To address the limitations of signature-based detection and delayed response to Windows endpoint malware attacks, this paper proposes a proactive defense method for single-machine susceptibility prediction. Leveraging tens of millions of real-world Windows Defender logs, we construct a susceptibility modeling framework that integrates historical behavioral patterns with fine-grained, time-series system features. We design a lightweight, transferable architecture suitable for enterprise-scale deployment and validate the causal predictive contribution of key features via causal feature analysis. Employing XGBoost and LightGBM ensemble models with interpretability-driven feature engineering, our approach achieves 98.3% accuracy, 92.7% early-vulnerability recall, and a false positive rate below 0.4% on production data, with per-sample inference latency under 15 ms.
📝 Abstract
In an era of escalating cyber threats, malware poses significant risks to individuals and organizations, potentially leading to data breaches, system failures, and substantial financial losses. This study addresses the urgent need for effective malware detection strategies by leveraging Machine Learning (ML) techniques on extensive datasets collected from Microsoft Windows Defender. Our research aims to develop an advanced ML model that accurately predicts malware vulnerabilities based on the specific conditions of individual machines. Moving beyond traditional signature-based detection methods, we incorporate historical data and innovative feature engineering to enhance detection capabilities. This study makes several contributions: first, it advances existing malware detection techniques by employing sophisticated ML algorithms; second, it utilizes a large-scale, real-world dataset to ensure the applicability of findings; third, it highlights the importance of feature analysis in identifying key indicators of malware infections; and fourth, it proposes models that can be adapted for enterprise environments, offering a proactive approach to safeguarding extensive networks against emerging threats. We aim to improve cybersecurity resilience, providing critical insights for practitioners in the field and addressing the evolving challenges posed by malware in a digital landscape. Finally, discussions on results, insights, and conclusions are presented.