🤖 AI Summary
Existing neural network testing methods struggle to efficiently generate diverse failure-inducing test cases while preserving input semantics and data distribution. This work proposes a novel approach that integrates interpretable saliency analysis with uncertainty-aware Bayesian optimization, combining interpretability-guided local perturbations and Bayesian optimization for the first time. The method adaptively applies semantic-preserving adversarial perturbations in critical regions, significantly improving failure discovery rate, test case diversity, and quality under limited perturbation budgets, while also enhancing coverage of critical neurons. Empirical evaluations demonstrate superior performance across MNIST, CIFAR-10, and ImageNet benchmarks. Moreover, the generated samples effectively support model fine-tuning, further boosting robustness.
📝 Abstract
As neural networks are increasingly deployed in safety-critical domains, testing is essential to evaluate and improve their reliability. Existing testing methods, whether black-box or white-box, primarily use global mutation or coverage-guided strategies, both of which struggle to efficiently uncover diverse model failures while remaining proximate to the original data distribution and semantics. We propose BayesWarp, a testing framework that addresses this limitation by mutating decision-critical input regions identified via interpretable saliency techniques and adaptively guiding the testing process using an uncertainty-aware Bayesian Optimization strategy, enabling the discovery of diverse failures while preserving distributional and semantic proximity to the original data. Evaluation on MNIST, CIFAR-10, and ImageNet across six neural network models shows that BayesWarp improves failure discovery, failure diversity, test case quality, and critical neuron coverage under a fixed mutation budget. These results demonstrate that BayesWarp improves testing effectiveness. Moreover, fine-tuning with the generated failure cases leads to improvements in model performance.