🤖 AI Summary
This work addresses the limited discriminative capability of proximity measures in time series classification. We propose PF-GAP, the first method to systematically extend RF-GAP proximity—originally defined for single decision trees—to the ensemble level of Proximity Forests. PF-GAP integrates multidimensional scaling (MDS) to generate high-fidelity univariate time series embeddings and jointly applies Local Outlier Factor (LOF) to analyze associations between misclassified samples and anomalous structural patterns. Experiments demonstrate that PF-GAP embeddings substantially outperform conventional distance metrics—including DTW and Euclidean distance—yielding consistent improvements in k-NN classification accuracy across multiple benchmark datasets. More importantly, the learned forest-level proximity structure enables more precise identification of outlier patterns within misclassified instances, revealing PF-GAP’s unique advantages for anomaly attribution and model interpretability in time series analysis.
📝 Abstract
RF-GAP has recently been introduced as an improved random forest proximity measure. In this paper, we present PF-GAP, an extension of RF-GAP proximities to proximity forests, an accurate and efficient time series classification model. We use the forest proximities in connection with Multi-Dimensional Scaling to obtain vector embeddings of univariate time series, comparing the embeddings to those obtained using various time series distance measures. We also use the forest proximities alongside Local Outlier Factors to investigate the connection between misclassified points and outliers, comparing with nearest neighbor classifiers which use time series distance measures. We show that the forest proximities seem to exhibit a stronger connection between misclassified points and outliers than nearest neighbor classifiers.