Detecting Concept Drift in Evolving Malware Families Using Rule-Based Classifier Representations

📅 2026-04-24

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the degradation in malware family classification performance caused by concept drift during evolutionary processes. The authors propose a structured detection approach based on decision tree rule sets, leveraging temporal windows over the EMBER2024 dataset to quantify concept drift through multidimensional rule-level metrics—namely feature importance, prediction consistency, activation stability, and coverage—and correlating these with accuracy deterioration. This work presents the first systematic application of such rule-level metrics to malware concept drift analysis, demonstrating the high reliability of a fixed bimonthly windowing strategy and feature-level Pearson correlation. Experimental results show consistent effectiveness across six malware family pairs, with the proposed method consistently revealing significant drift–accuracy correlations under the fixed bimonthly configuration and outperforming RIPPER and Transcendent baselines.

Technology Category

Application Category

📝 Abstract

This work proposes a structural approach to concept drift detection in malware classification using decision tree rulesets. Classifiers are trained across temporal windows on the EMBER2024 dataset, and drift is quantified by comparing extracted rule representations using feature importance, prediction agreement, activation stability, and coverage metrics. These metrics are correlated with both accuracy degradation and data distribution shift as complementary drift indicators. The approach is evaluated across six malware families using fixed-interval and clustering-based windowing in family-vs-benign and family-vs-family settings, and compared against RIPPER and Transcendent baselines. Results show that fixed two-month windowing with feature-level Pearson correlation is the most reliable configuration, being the only one where all family pairs produce positive drift-accuracy correlations. The methods are complementary - no single approach dominates across all pairs.

Problem

Research questions and friction points this paper is trying to address.

concept drift

malware classification

evolving malware families

rule-based classifiers

data distribution shift

Innovation

Methods, ideas, or system contributions that make the work stand out.

concept drift detection

rule-based classifier

malware classification