Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen

📅 2025-11-20

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

To address the instability and lack of standardization in evaluating synthetic Android malware data, this paper proposes Super-Metric—a unified, multidimensional evaluation framework. It integrates eight interpretable metrics across four fidelity-oriented dimensions: statistical, feature-level, behavioral, and task-level, and aggregates them into a single robust score via data-driven weighting. Extensive experiments across ten generative models and five balanced real-world datasets demonstrate that Super-Metric significantly outperforms conventional unidimensional metrics (e.g., Jensen–Shannon divergence, Fréchet Inception Distance), improving evaluation stability by 37.2% and achieving a strong correlation (ρ = 0.91, *p* < 0.01) with downstream classifier performance. Super-Metric has been integrated into the MalDataGen framework, establishing a reproducible, generalizable, and standardized benchmark for synthetic data quality assessment in Android malware research.

Technology Category

Application Category

📝 Abstract

Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. This work integrates into MalDataGen a Super-Metric that aggregates eight metrics across four fidelity dimensions, producing a single weighted score. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers.

Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic Android malware data quality faces instability challenges

Lack of standardized metrics creates evaluation inconsistencies in malware generation

Existing metrics fail to reliably correlate with classifier performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates eight metrics across four fidelity dimensions

Produces a single weighted score for evaluation

Demonstrates higher stability than traditional metrics

🔎 Similar Papers

No similar papers found.

Authors to Follow