🤖 AI Summary
Existing evaluation methods for adversarial attacks on tabular data suffer from three critical limitations: (1) excessive sample distortion, (2) violation of inherent feature dependencies, and (3) lack of interpretability—hindering reliable assessment of attack efficacy and robustness.
Method: This paper introduces the first evaluation framework that jointly ensures coherence and feature consistency. It comprises: (1) a novel identifiability metric grounded in anomaly detection; (2) SHAP-based attribution to localize decision inconsistency; and (3) a dependency-aware collaborative perturbation algorithm preserving inter-feature structural relationships.
Contribution/Results: Extensive experiments across multiple benchmark tabular datasets demonstrate that the framework effectively exposes trade-offs among attack risk, query cost, and adversarial sample quality. It significantly enhances evaluation credibility and provides a reusable, quantitative benchmark and systematic assessment paradigm for tabular adversarial machine learning.
📝 Abstract
Machine learning models trained on tabular data are vulnerable to adversarial attacks, even in realistic scenarios where attackers have access only to the model's outputs. Researchers evaluate such attacks by considering metrics like success rate, perturbation magnitude, and query count. However, unlike other data domains, the tabular domain contains complex interdependencies among features, presenting a unique aspect that should be evaluated: the need for the attack to generate coherent samples and ensure feature consistency for indistinguishability. Currently, there is no established methodology for evaluating adversarial samples based on these criteria. In this paper, we address this gap by proposing new evaluation criteria tailored for tabular attacks' quality; we defined anomaly-based framework to assess the distinguishability of adversarial samples and utilize the SHAP explainability technique to identify inconsistencies in the model's decision-making process caused by adversarial samples. These criteria could form the basis for potential detection methods and be integrated into established evaluation metrics for assessing attack's quality Additionally, we introduce a novel technique for perturbing dependent features while maintaining coherence and feature consistency within the sample. We compare different attacks' strategies, examining black-box query-based attacks and transferability-based gradient attacks across four target models. Our experiments, conducted on benchmark tabular datasets, reveal significant differences between the examined attacks' strategies in terms of the attacker's risk and effort and the attacks' quality. The findings provide valuable insights on the strengths, limitations, and trade-offs of various adversarial attacks in the tabular domain, laying a foundation for future research on attacks and defense development.