🤖 AI Summary
This work addresses the pervasive structural redundancy in time series foundation models (TSFMs), which undermines their reliability and efficiency. Through large-scale evaluation and mechanistic interpretability analysis, we find that mainstream TSFMs exhibit robustness to entire-layer removal and identify specific attention heads responsible for dominant repetitive patterns and seasonal biases. We propose an intrinsic pruning strategy based on stable rank, offering the first systematic characterization of shared redundancy mechanisms and their degradation origins across diverse TSFMs. By integrating component ablation, direct logit attribution via residual stream analysis, and a theoretical framework interpreting Transformers as kernel regressors, we validate both the layer-wise redundancy and the critical role of particular attention heads across multiple real-world and synthetic datasets, thereby establishing a new pathway toward efficient and reliable time series modeling.
📝 Abstract
Time Series Foundation Models (TSFMs) leverage extensive pretraining to accurately predict unseen time series during inference, without the need for task-specific fine-tuning. Through large-scale evaluations on standard benchmarks, we find that leading transformer-based TSFMs exhibit redundant components in their intermediate layers. We introduce a set of tools for mechanistic interpretability of TSFMs, including ablations of specific components and direct logit attribution on the residual stream. Our findings are consistent across several leading TSFMs with diverse architectures, and across a diverse set of real-world and synthetic time-series datasets. We discover that all models in our study are robust to ablations of entire layers. Furthermore, we develop a theoretical framework framing transformers as kernel regressors, motivating a purely intrinsic strategy for ablating heads based on the stable rank of the per-head projection matrices. Using this approach, we uncover the specific heads responsible for degenerate phenomena widely observed in TSFMs, such as parroting of motifs from the context and seasonality bias. Our study sheds light on the universal properties of this emerging class of architectures for continuous-time sequence modeling.