An Empirical Evaluation of Factors Affecting SHAP Explanation of Time Series Classification

📅 2025-09-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing time-series segmentation strategies for SHAP-based explanations lack theoretical grounding, leading to suboptimal trade-offs between interpretability quality and computational efficiency. Method: We systematically evaluate eight segmentation algorithms and identify segment count—not algorithm choice—as the dominant factor affecting explanation fidelity and speed; equal-length segmentation achieves high explanation quality with significantly reduced computational overhead. We further propose a length-weighted normalization technique to mitigate attribution bias arising from heterogeneous segment lengths. Contribution/Results: Evaluated across diverse models (e.g., ResNet, InceptionTime) and datasets using two XAI metrics—InterpretTime and AUC Difference—our approach consistently improves SHAP explanation accuracy on both univariate and multivariate time series. It establishes a reproducible, efficient, and robust practical paradigm for time-series explainability, addressing critical gaps in segmentation design and attribution calibration.

Technology Category

Application Category

📝 Abstract
Explainable AI (XAI) has become an increasingly important topic for understanding and attributing the predictions made by complex Time Series Classification (TSC) models. Among attribution methods, SHapley Additive exPlanations (SHAP) is widely regarded as an excellent attribution method; but its computational complexity, which scales exponentially with the number of features, limits its practicality for long time series. To address this, recent studies have shown that aggregating features via segmentation, to compute a single attribution value for a group of consecutive time points, drastically reduces SHAP running time. However, the choice of the optimal segmentation strategy remains an open question. In this work, we investigated eight different Time Series Segmentation algorithms to understand how segment compositions affect the explanation quality. We evaluate these approaches using two established XAI evaluation methodologies: InterpretTime and AUC Difference. Through experiments on both Multivariate (MTS) and Univariate Time Series (UTS), we find that the number of segments has a greater impact on explanation quality than the specific segmentation method. Notably, equal-length segmentation consistently outperforms most of the custom time series segmentation algorithms. Furthermore, we introduce a novel attribution normalisation technique that weights segments by their length and we show that it consistently improves attribution quality.
Problem

Research questions and friction points this paper is trying to address.

Evaluating segmentation strategies for SHAP in time series classification
Assessing impact of segment number on explanation quality
Improving attribution quality with length-weighted normalization technique
Innovation

Methods, ideas, or system contributions that make the work stand out.

Equal-length segmentation outperforms custom algorithms
Novel length-weighted attribution normalization improves quality
Segment count impacts explanation quality more than method
🔎 Similar Papers
No similar papers found.
N
Nikos Papadeas
School of Engineering Mathematics and Technology, University of Bristol, UK
D
Davide Italo Serramazza
School of Computer Science, University College Dublin, Ireland
Z
Zahraa Abdallah
School of Engineering Mathematics and Technology, University of Bristol, UK
Georgiana Ifrim
Georgiana Ifrim
Associate Professor, University College Dublin, Insight Centre for Data Analytics & ML-Labs
Machine LearningSequence LearningTime SeriesExplainable AI