MutDafny: A Mutation-Based Approach to Assess Dafny Specifications

📅 2025-11-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the challenge of detecting latent weaknesses in Dafny formal specifications—defects that evade traditional formal verification. We propose the first mutation testing framework tailored for Dafny. Methodologically, we systematically construct 32 domain-specific mutation operators: 14 are derived from real-world bug-fix commits in GitHub-hosted Dafny projects, ensuring high semantic relevance; additionally, we design an automated weak-specification identification mechanism to precisely localize specification flaws that fail to detect mutants. Evaluated on 794 real Dafny programs, our framework identifies, on average, one strengthenable weak specification per 241 lines of code, effectively exposing behavioral deviations masked by formal verification. Our core contributions are (i) the first comprehensive, Dafny-specific mutation operator taxonomy, and (ii) an empirically grounded paradigm for weak-specification detection.

Technology Category

Application Category

📝 Abstract
This paper explores the use of mutation testing to reveal weaknesses in formal specifications written in Dafny. In verification-aware programming languages, such as Dafny, despite their critical role, specifications are as prone to errors as implementations. Flaws in specs can result in formally verified programs that deviate from the intended behavior. We present MutDafny, a tool that increases the reliability of Dafny specifications by automatically signaling potential weaknesses. Using a mutation testing approach, we introduce faults (mutations) into the code and rely on formal specifications for detecting them. If a program with a mutant verifies, this may indicate a weakness in the specification. We extensively analyze mutation operators from popular tools, identifying the ones applicable to Dafny. In addition, we synthesize new operators tailored for Dafny from bugfix commits in publicly available Dafny projects on GitHub. Drawing from both, we equipped our tool with a total of 32 mutation operators. We evaluate MutDafny's effectiveness and efficiency in a dataset of 794 real-world Dafny programs and we manually analyze a subset of the resulting undetected mutants, identifying five weak real-world specifications (on average, one at every 241 lines of code) that would benefit from strengthening.
Problem

Research questions and friction points this paper is trying to address.

Assessing Dafny specification weaknesses through mutation testing
Detecting undetected mutants indicating flawed formal specifications
Improving reliability of verified programs by strengthening specifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

MutDafny applies mutation testing to Dafny specifications
Tool introduces 32 mutation operators from existing tools and GitHub
Identifies specification weaknesses when mutants pass verification