🤖 AI Summary
Existing graph-based Android malware classifiers suffer a sharp 45% accuracy drop against unseen variants from known families, revealing critical limitations in shallow semantic modeling and poor generalization under distribution shift. Method: We propose a semantic-enhanced framework that (i) constructs a heterogeneous graph representation integrating function-level metadata and large language model–generated code embeddings; (ii) designs an elastic semantic enhancement mechanism supporting multi-source feature inputs; and (iii) jointly optimizes graph structure and semantic representation, compatible with diverse GNN backbones and adaptive detection strategies. Contribution/Results: We release MalNet-Tiny-Common/Distinct—a novel benchmark targeting cross-family and temporal distribution shifts. Experiments demonstrate an average 8% performance gain across multiple GNN models, significantly mitigating performance degradation under distribution shift and enhancing robustness and strong generalization to previously unseen variants.
📝 Abstract
Graph-based malware classifiers can achieve over 94% accuracy on standard Android datasets, yet we find they suffer accuracy drops of up to 45% when evaluated on previously unseen malware variants from the same family - a scenario where strong generalization would typically be expected. This highlights a key limitation in existing approaches: both the model architectures and their structure-only representations often fail to capture deeper semantic patterns. In this work, we propose a robust semantic enrichment framework that enhances function call graphs with contextual features, including function-level metadata and, when available, code embeddings derived from large language models. The framework is designed to operate under real-world constraints where feature availability is inconsistent, and supports flexible integration of semantic signals. To evaluate generalization under realistic domain and temporal shifts, we introduce two new benchmarks: MalNet-Tiny-Common and MalNet-Tiny-Distinct, constructed using malware family partitioning to simulate cross-family generalization and evolving threat behavior. Experiments across multiple graph neural network backbones show that our method improves classification performance by up to 8% under distribution shift and consistently enhances robustness when integrated with adaptation-based methods. These results offer a practical path toward building resilient malware detection systems in evolving threat environments.