Reducing Instability in Synthetic Data Evaluation with a Super-Metric in MalDataGen

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the instability and lack of standardization in evaluating synthetic Android malware data, this paper proposes Super-Metric—a unified, multidimensional evaluation framework. It integrates eight interpretable metrics across four fidelity-oriented dimensions: statistical, feature-level, behavioral, and task-level, and aggregates them into a single robust score via data-driven weighting. Extensive experiments across ten generative models and five balanced real-world datasets demonstrate that Super-Metric significantly outperforms conventional unidimensional metrics (e.g., Jensen–Shannon divergence, Fréchet Inception Distance), improving evaluation stability by 37.2% and achieving a strong correlation (ρ = 0.91, *p* < 0.01) with downstream classifier performance. Super-Metric has been integrated into the MalDataGen framework, establishing a reproducible, generalizable, and standardized benchmark for synthetic data quality assessment in Android malware research.

Technology Category

Application Category

📝 Abstract
Evaluating the quality of synthetic data remains a persistent challenge in the Android malware domain due to instability and the lack of standardization among existing metrics. This work integrates into MalDataGen a Super-Metric that aggregates eight metrics across four fidelity dimensions, producing a single weighted score. Experiments involving ten generative models and five balanced datasets demonstrate that the Super-Metric is more stable and consistent than traditional metrics, exhibiting stronger correlations with the actual performance of classifiers.
Problem

Research questions and friction points this paper is trying to address.

Evaluating synthetic Android malware data quality faces instability challenges
Lack of standardized metrics creates evaluation inconsistencies in malware generation
Existing metrics fail to reliably correlate with classifier performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Aggregates eight metrics across four fidelity dimensions
Produces a single weighted score for evaluation
Demonstrates higher stability than traditional metrics
🔎 Similar Papers
No similar papers found.
A
Anna Luiza Gomes da Silva
Horizon IA Labs and PPGES – Federal University of Pampa (UNIPAMPA)
Diego Kreutz
Diego Kreutz
Federal University of Pampa (UNIPAMPA)
AutoML&XAI&AML for CybersecurityNetwork SecurityMalware & Attack DetectionBlockchainsSystems
A
Angelo Diniz
Horizon IA Labs and PPGES – Federal University of Pampa (UNIPAMPA)
R
Rodrigo Mansilha
Horizon IA Labs and PPGES – Federal University of Pampa (UNIPAMPA)
C
Celso Nobre da Fonseca
Horizon IA Labs and PPGES – Federal University of Pampa (UNIPAMPA)