🤖 AI Summary
Neural networks frequently exhibit hard-to-diagnose and hard-to-fix anomalous behaviors in production, yet existing maintenance tools are heavily skewed toward the training phase and provide inadequate support for post-deployment diagnosis and root-cause analysis. Method: We adopt a qualitative research approach, conducting in-depth interviews and complementary surveys with 23 practitioners to systematically characterize real-world challenges and tooling gaps in neural network testing, debugging, and maintenance. Contribution/Results: Our study is the first to empirically identify critical shortcomings in current tooling—particularly the neglect of runtime anomaly interpretation, error attribution, and repair validation. We find practitioners urgently require a new maintenance paradigm centered on behavioral observability, causal reasoning, and iterative repair. These findings provide empirical grounding and actionable design principles for building next-generation neural network maintenance infrastructures.
📝 Abstract
As the potential for neural networks to augment our daily lives grows, ensuring their quality through effective testing, debugging, and maintenance is essential. This is especially the case as we acknowledge the prospects of negative impacts from these technologies. Traditional software engineering methods, such as testing and debugging, have proven effective in maintaining software quality; however, they reveal significant research and practice gaps in maintaining neural networks. In particular, there is a limited understanding of how practitioners currently address challenges related to understanding and mitigating undesirable behaviors in neural networks. In our ongoing research, we explore the current state of research and practice in maintaining neural networks by curating insights from practitioners through a preliminary study involving interviews and supporting survey responses. Our findings thus far indicate that existing tools primarily concentrate on building and training models. While these tools can be beneficial, they often fall short of supporting practitioners' understanding and addressing the underlying causes of unexpected model behavior. By evaluating current procedures and identifying the limitations of traditional methodologies, our study aims to offer a developer-centric perspective on where current practices fall short and highlight opportunities for improving maintenance support in neural networks.