🤖 AI Summary
To address the instability and lack of standardization in evaluating synthetic Android malware data, this paper proposes Super-Metric—a unified, multidimensional evaluation framework. It integrates eight interpretable metrics across four fidelity-oriented dimensions: statistical, feature-level, behavioral, and task-level, and aggregates them into a single robust score via data-driven weighting. Extensive experiments across ten generative models and five balanced real-world datasets demonstrate that Super-Metric significantly outperforms conventional unidimensional metrics (e.g., Jensen–Shannon divergence, Fréchet Inception Distance), improving evaluation stability by 37.2% and achieving a strong correlation (ρ = 0.91, *p* < 0.01) with downstream classifier performance. Super-Metric has been integrated into the MalDataGen framework, establishing a reproducible, generalizable, and standardized benchmark for synthetic data quality assessment in Android malware research.
📝 Abstract
Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. This work integrates into MalDataGen a Super-Metric that aggregates eight metrics across four fidelity dimensions, producing a single weighted score. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers.