🤖 AI Summary
To address the urgent need for quality assurance of deep learning (DL) models in safety-critical applications, this paper presents a systematic survey of coverage-guided testing (CGT). Through comprehensive literature analysis and structured taxonomy, we establish a methodological framework comprising three pillars: coverage analysis, coverage-driven input generation, and coverage optimization. We introduce, for the first time, a formal mapping between structural coverage criteria and test objectives—revealing critical challenges including limited cross-task and cross-model generalizability. Furthermore, we unify and critically examine prevalent benchmarks, evaluation metrics, and experimental protocols, thereby identifying key directions for standardized evaluation and integrated toolchain development. This survey delivers a clear technical roadmap for DL testing research, synthesizes pivotal open problems, and charts concrete future research avenues—thereby bridging academic innovation and industrial deployment.
📝 Abstract
As Deep Learning (DL) models are increasingly applied in safety-critical domains, ensuring their quality has emerged as a pressing challenge in modern software engineering. Among emerging validation paradigms, coverage-guided testing (CGT) has gained prominence as a systematic framework for identifying erroneous or unexpected model behaviors. Despite growing research attention, existing CGT studies remain methodologically fragmented, limiting the understanding of current advances and emerging trends. This work addresses that gap through a comprehensive review of state-of-the-art CGT methods for DL models, including test coverage analysis, coverage-guided test input generation, and coverage-guided test input optimization. This work provides detailed taxonomies to organize these methods based on methodological characteristics and application scenarios. We also investigate evaluation practices adopted in existing studies, including the use of benchmark datasets, model architectures, and evaluation aspects. Finally, open challenges and future directions are highlighted in terms of the correlation between structural coverage and testing objectives, method generalizability across tasks and models, practical deployment concerns, and the need for standardized evaluation and tool support. This work aims to provide a roadmap for future academic research and engineering practice in DL model quality assurance.