MuFF: Stable and Sensitive Post-training MutationTesting for Deep Learning

📅 2025-01-16

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the instability, insensitivity, and low killability of mutants in post-training mutation testing for deep learning models, this paper proposes MuFF—a highly efficient and stable mutation framework. Methodologically, MuFF introduces (1) an automated stability verification mechanism that dynamically filters out low-quality mutants, and (2) two novel mutation operators—weight suppressors and neuron suppressors—that enable fine-grained perturbations at both parameter and activation levels. Experimental results demonstrate that MuFF achieves 60% and 25% higher sensitivity than DeepMutation++ and DeepCrime, respectively, exhibits significantly greater mutant stability than DeepMutation++, and generates mutants 61× faster than DeepCrime. By jointly optimizing sensitivity, stability, and killability, MuFF establishes a more robust foundation for reliability assessment of deep learning models.

Technology Category

Application Category

📝 Abstract

Rapid adoptions of Deep Learning (DL) in a broad range of fields led to the development of specialised testing techniques for DL systems, including DL mutation testing. However, existing post-training DL mutation techniques often generate unstable mutants across multiple training repetitions and multiple applications of the same mutation operator. Additionally, while extremely efficient, they generate mutants without taking into account the mutants' sensitivity and killability, resulting in a large number of ineffective mutants compared to pre-training mutants. In this paper, we present a new efficient post-training DL mutation technique, named MuFF, designed to ensure the stability of the mutants and capable of generating killable and sensitive mutants. MuFF implements an automated stability check and introduces two mutation operators, named weight and neuron inhibitors. Our extensive empirical experiments show that MuFF generates mutants with 60%pt and 25%pt higher sensitivity compared to DeepMutation++ and DeepCrime, respectively, while also producing mutants that are more stable than those of DeepMutation++ and different from the mutants of DeepCrime. Moreover, MuFF preserves the benefits of the post-training mutation technique, being 61 times faster than DeepCrime in generating mutants.

Problem

Research questions and friction points this paper is trying to address.

Deep Learning

Post-Training Testing

Stability and Effectiveness Improvement

Innovation

Methods, ideas, or system contributions that make the work stand out.

MuFF

Deep Learning Model Testing

Sensitivity Improvement

🔎 Similar Papers

Predicting and analyzing memorization within fine-tuned Large Language Models