π€ AI Summary
Time-series anomaly detection models often suffer from bias and miscalibrated confidence, while existing explanation methods are typically unidirectional, instance-level, and poorly scalable. This paper proposes HILAD, a human-in-the-loop framework for anomaly detection, introducing the first bidirectional feedback mechanism: it employs scalable temporal attribution visualization to diagnose model behavior, enabling domain experts to jointly identify anomalies, interpret attributions, perform real-time corrections, and drive iterative model refinementβall within a unified interface. HILAD integrates interactive visualization, cross-dataset generalizable attribution analysis, and closed-loop feedback optimization. Evaluated on two real-world time-series datasets and a user study, HILAD significantly improves model reliability (+23.6%), human comprehension depth (+41.2%), and correction response speed (β68% latency), overcoming the unidirectionality and operational limitations of conventional explainability approaches.
π Abstract
Time series anomaly detection is a critical machine learning task for numerous applications, such as finance, healthcare, and industrial systems. However, even high-performed models may exhibit potential issues such as biases, leading to unreliable outcomes and misplaced confidence. While model explanation techniques, particularly visual explanations, offer valuable insights to detect such issues by elucidating model attributions of their decision, many limitations still exist -- They are primarily instance-based and not scalable across dataset, and they provide one-directional information from the model to the human side, lacking a mechanism for users to address detected issues. To fulfill these gaps, we introduce HILAD, a novel framework designed to foster a dynamic and bidirectional collaboration between humans and AI for enhancing anomaly detection models in time series. Through our visual interface, HILAD empowers domain experts to detect, interpret, and correct unexpected model behaviors at scale. Our evaluation with two time series datasets and user studies demonstrates the effectiveness of HILAD in fostering a deeper human understanding, immediate corrective actions, and the reliability enhancement of models.