🤖 AI Summary
This study systematically investigates the impact of data source diversity on cryptocurrency price forecasting performance. To address this, we propose the Crypto100 index and develop a domain-aware feature selection and dimensionality reduction framework tailored to heterogeneous multi-source data—including on-chain metrics, technical indicators, sentiment signals, traditional financial market data, and macroeconomic variables—integrated within a rolling-window modeling framework and evaluated via LSTM and multivariate regression benchmarks. Our key empirical contribution is the first identification that on-chain features contribute over 40% to short-horizon forecasting accuracy, establishing their centrality; conversely, macroeconomic and traditional market indicators exhibit markedly increasing importance with longer forecast horizons. Experimental results demonstrate that the proposed methodology reduces mean absolute error (MAE) by an average of 18.7% across short-, medium-, and long-term forecasting tasks, significantly enhancing model generalizability and stability.
📝 Abstract
This study investigates the impact of data source diversity on the performance of cryptocurrency forecasting models by integrating various data categories, including technical indicators, on-chain metrics, sentiment and interest metrics, traditional market indices, and macroeconomic indicators. We introduce the Crypto100 index, representing the top 100 cryptocurrencies by market capitalization, and propose a novel feature reduction algorithm to identify the most impactful and resilient features from diverse data sources. Our comprehensive experiments demonstrate that data source diversity significantly enhances the predictive performance of forecasting models across different time horizons. Key findings include the paramount importance of on-chain metrics for both short-term and long-term predictions, the growing relevance of traditional market indices and macroeconomic indicators for longer-term forecasts, and substantial improvements in model accuracy when diverse data sources are utilized. These insights help demystify the short-term and long-term driving factors of the cryptocurrency market and lay the groundwork for developing more accurate and resilient forecasting models.