🤖 AI Summary
This study addresses the challenge of degraded data quality—specifically outliers and missing values—in long-horizon time series forecasting, which critically undermines model robustness. We establish a unified evaluation framework to systematically benchmark mainstream models—including LSTM, Prophet, XGBoost, and Random Forest—under three realistic data conditions: complete, noisy (outlier-contaminated), and incomplete (missing-value) sequences, with ARIMA as the baseline. Methodologically, we employ sliding-window modeling, multi-step rolling prediction, and adaptive imputation for preprocessing. Our key contributions include: (i) a novel, interpretable algorithm selection guideline grounded in data characteristics and forecasting requirements; and (ii) empirical findings demonstrating that XGBoost reduces average MAE by 23% under noise, while Prophet exhibits superior stability for long-term trend forecasting. The results provide reproducible, principled guidance for industrial-scale time series modeling.
📝 Abstract
The explosion of Time Series (TS) data, driven by advancements in technology, necessitates sophisticated analytical methods. Modern management systems increasingly rely on analyzing this data, highlighting the importance of effcient processing techniques. State-of-the-art Machine Learning (ML) approaches for TS analysis and forecasting are becoming prevalent. This paper briefly describes and compiles suitable algorithms for TS regression task. We compare these algorithms against each other and the classic ARIMA method using diverse datasets: complete data, data with outliers, and data with missing values. The focus is on forecasting accuracy, particularly for long-term predictions. This research aids in selecting the most appropriate algorithm based on forecasting needs and data characteristics.