🤖 AI Summary
Existing Transformer-based time series forecasting methods indiscriminately model all token dependencies, ignoring their varying effectiveness across scenarios, thereby limiting predictive performance. To address this, we formally define “effective dependency” from a logical perspective—as the semantic consistency-preserving dependency required for tokens interpreted as atomic formulas—and pioneer the integration of formal logic into attention mechanism design. We propose Attention Logic Regularization (Attn-L-Reg), comprising (i) atomic formula alignment loss, (ii) sparsity-inducing regularization on attention maps under logical constraints, and (iii) differentiable structural pruning. This framework enables interpretable, sparsity-controllable dependency learning while being plug-and-play with zero training overhead. Extensive experiments on multiple benchmark datasets demonstrate an average 8.2% reduction in MAE. Furthermore, we theoretically establish a tighter generalization error bound for our method compared to standard attention.
📝 Abstract
Time series forecasting (TSF) plays a crucial role in many applications. Transformer-based methods are one of the mainstream techniques for TSF. Existing methods treat all token dependencies equally. However, we find that the effectiveness of token dependencies varies across different forecasting scenarios, and existing methods ignore these differences, which affects their performance. This raises two issues: (1) What are effective token dependencies? (2) How can we learn effective dependencies? From a logical perspective, we align Transformer-based TSF methods with the logical framework and define effective token dependencies as those that ensure the tokens as atomic formulas (Issue 1). We then align the learning process of Transformer methods with the process of obtaining atomic formulas in logic, which inspires us to design a method for learning these effective dependencies (Issue 2). Specifically, we propose Attention Logic Regularization (Attn-L-Reg), a plug-and-play method that guides the model to use fewer but more effective dependencies by making the attention map sparse, thereby ensuring the tokens as atomic formulas and improving prediction performance. Extensive experiments and theoretical analysis confirm the effectiveness of Attn-L-Reg.