🤖 AI Summary
To address the quadratic time complexity of Transformers and the inability of RNNs to model bidirectional dependencies in long-term time series forecasting, this paper proposes the Bidirectional Linear Recurrent Unit (BLUR)—a novel sequence modeling architecture achieving both linear time complexity and explicit bidirectional dependency capture. BLUR integrates forward and backward linear recurrent units to jointly model historical and future contextual information, underpinned by a theoretically grounded stable initialization scheme ensuring training convergence. We further design an end-to-end multi-source time series fusion framework based on BLUR. Extensive experiments on image sequences and real-world meteorological and energy datasets demonstrate that BLUR outperforms both Transformers and conventional RNNs in forecasting accuracy, while accelerating training by 3.2× and reducing memory consumption by 57%. To our knowledge, BLUR is the first linear-time RNN variant to simultaneously surpass Transformers in both accuracy and efficiency.
📝 Abstract
Sequence modeling is a critical yet challenging task with wide-ranging applications, especially in time series forecasting for domains like weather prediction, temperature monitoring, and energy load forecasting. Transformers, with their attention mechanism, have emerged as state-of-the-art due to their efficient parallel training, but they suffer from quadratic time complexity, limiting their scalability for long sequences. In contrast, recurrent neural networks (RNNs) offer linear time complexity, spurring renewed interest in linear RNNs for more computationally efficient sequence modeling. In this work, we introduce BLUR (Bidirectional Linear Unit for Recurrent network), which uses forward and backward linear recurrent units (LRUs) to capture both past and future dependencies with high computational efficiency. BLUR maintains the linear time complexity of traditional RNNs, while enabling fast parallel training through LRUs. Furthermore, it offers provably stable training and strong approximation capabilities, making it highly effective for modeling long-term dependencies. Extensive experiments on sequential image and time series datasets reveal that BLUR not only surpasses transformers and traditional RNNs in accuracy but also significantly reduces computational costs, making it particularly suitable for real-world forecasting tasks. Our code is available here.