π€ AI Summary
Existing time-series forecasting models under the pretraining-finetuning paradigm often neglect joint time-frequency modeling, hindering simultaneous capture of periodicity and prior patterns and thus limiting performance on complex sequences. This paper proposes MoFE-Timeβthe first framework to integrate time-domain and frequency-domain representations within a pretraining-finetuning pipeline. It introduces frequency-domain expert units coupled with a sparse routing mechanism, operating in parallel with standard attention modules to form a Mixture-of-Experts (MoE) architecture. This design enables cross-task transfer of multi-periodic knowledge and constructs multidimensional sparse representations of input signals, substantially enhancing generalization. Evaluated on six public benchmarks, MoFE-Time achieves state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02%, respectively, over Time-MoE. Its effectiveness is further validated on a real-world new energy vehicle (NEV) sales dataset.
π Abstract
As a prominent data modality task, time series forecasting plays a pivotal role in diverse applications. With the remarkable advancements in Large Language Models (LLMs), the adoption of LLMs as the foundational architecture for time series modeling has gained significant attention. Although existing models achieve some success, they rarely both model time and frequency characteristics in a pretraining-finetuning paradigm leading to suboptimal performance in predictions of complex time series, which requires both modeling periodicity and prior pattern knowledge of signals. We propose MoFE-Time, an innovative time series forecasting model that integrates time and frequency domain features within a Mixture of Experts (MoE) network. Moreover, we use the pretraining-finetuning paradigm as our training framework to effectively transfer prior pattern knowledge across pretraining and finetuning datasets with different periodicity distributions. Our method introduces both frequency and time cells as experts after attention modules and leverages the MoE routing mechanism to construct multidimensional sparse representations of input signals. In experiments on six public benchmarks, MoFE-Time has achieved new state-of-the-art performance, reducing MSE and MAE by 6.95% and 6.02% compared to the representative methods Time-MoE. Beyond the existing evaluation benchmarks, we have developed a proprietary dataset, NEV-sales, derived from real-world business scenarios. Our method achieves outstanding results on this dataset, underscoring the effectiveness of the MoFE-Time model in practical commercial applications.