Reassessing feature-based Android malware detection in a contemporary context

📅 2023-01-30

📈 Citations: 5

✨ Influential: 0

career value

226K/year

🤖 AI Summary

Prior evaluations of feature-based Android malware detection lack reproducibility, modern benchmarks, and systematic comparison across a decade of methods. Method: We systematically reimplemented and empirically evaluated 18 feature-based detection approaches published between 2013–2023 on a balanced, contemporary dataset of 124,000 apps. Our analysis integrates static features (API calls, opcodes), dynamic features (network traffic), and traditional/integrated ML models (e.g., RF, SVM). Contribution/Results: Static features—particularly API calls and opcodes—dominate detection performance; dynamic features yield marginal gains. Lightweight models significantly outperform complex ones. A concise ensemble combining static features with select dynamic features achieves >98% accuracy while maintaining high efficiency. This is the first unified, reproducible benchmark demonstrating the enduring robustness and practical relevance of classical feature-based methods in modern Android environments. The study establishes a low-overhead, high-accuracy paradigm especially suitable for resource-constrained deployment scenarios.

📝 Abstract

We report the findings of a reimplementation of 18 foundational studies in feature-based machine learning for Android malware detection, published during the period 2013-2023. These studies are reevaluated on a level playing field using a contemporary Android environment and a balanced dataset of 124,000 applications. Our findings show that feature-based approaches can still achieve detection accuracies beyond 98%, despite a considerable increase in the size of the underlying Android feature sets. We observe that features derived through dynamic analysis yield only a small benefit over those derived from static analysis, and that simpler models often out-perform more complex models. We also find that API calls and opcodes are the most productive static features within our evaluation context, network traffic is the most predictive dynamic feature, and that ensemble models provide an efficient means of combining models trained on static and dynamic features. Together, these findings suggest that simple, fast machine learning approaches can still be an effective basis for malware detection, despite the increasing focus on slower, more expensive machine learning models in the literature.

Problem

Research questions and friction points this paper is trying to address.

Reassessing Android malware detection using feature-based machine learning

Evaluating static vs dynamic analysis features for malware identification

Comparing simple and complex model performance in malware detection

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reimplemented 18 foundational malware detection studies

Evaluated static and dynamic features on balanced dataset

Found simple models outperform complex ensemble approaches

🔎 Similar Papers

Revisiting Static Feature-Based Android Malware Detection