🤖 AI Summary
This study addresses the limitations of existing macOS malware detection approaches, which inadequately model the platform’s unique system characteristics and Mach-O binary format. To bridge this gap, the work proposes a novel static analysis framework that systematically incorporates macOS-specific features—including embedded code-signing certificates, entitlement declarations, persistence mechanisms, and critical system API usage—into a machine learning–based detector. Evaluated on a dataset of 41,129 samples, the approach achieves a 98.50% detection rate, outperforming current methods by 16%. On a separate set of 9,000 previously unseen samples, it attains 99.50% accuracy, surpassing the state of the art by 50%. Ablation studies further demonstrate that removing these domain-specific features degrades performance by 15.92%, underscoring their essential role in enhancing model generalization.
📝 Abstract
Despite the growing popularity of macOS among end users and enterprise systems, malware research has primarily focused on Windows and Android operating systems, leaving the problem of macOS malware detection relatively unexplored. Indeed, the specificity of the operating system and the unique characteristics of the Mach-O file format can play a fundamental role in the classification of unknown samples, drastically increasing the detection rate. In this work, for the first time in the literature, we employ new domain-specific features, i.e., static features specific to macOS binaries, such as embedded certificates, entitlements, persistence techniques and key system APIs, to train a machine learning malware detector. We perform a comprehensive experimental evaluation on a novel dataset of 41,129 samples, comprising 11,413 benign and 29,716 malicious executables, and demonstrate that our solution achieves state-of-the-art detection performance (98.50%), outperforming all existing approaches, with an average improvement of 16% in terms of detection rate. We also provide an in-depth analysis of the importance of the individual features, showing that our detector effectively leverages the new domain-specific features. Then, in order to evaluate the generalization capabilities of our detector over time, we perform a real-world evaluation on a new dataset of 9,000 fresh macOS executables. The results show that (i) our detector maintains a very high detection rate (99.50%), (ii) outperforms the state-of-the-art by 50%, and (iii) the domain-specific features are crucial for generalizing to novel malware samples, as their removal leads to a 15.92% drop in detection performance. Finally, we also release our dataset to the research community.