🤖 AI Summary
This work addresses the fragmentation and attribute sparsity of Decentraland virtual land data by constructing IITP-VDLand—the first large-scale, multi-source fused dataset—incorporating four-dimensional features: parcel attributes, on-chain transactions, on-chain activities, and social media interactions. It introduces the novel “Rarity Score” to quantify geographic-topological uniqueness. A cross-platform multimodal data fusion framework is proposed, integrating OpenSea/Etherscan APIs, BigQuery queries, and a custom-built web crawler with automated cleaning pipelines. Systematic benchmarking across 20+ models reveals that Extra Trees achieves superior performance in land price prediction (R² = 0.8251, accuracy = 74.23%), outperforming both deep learning and linear baselines. Empirical analysis identifies coordinate location, spatial proximity, and economic indicators as the most predictive features.
📝 Abstract
This paper presents a comprehensive Decentraland parcels dataset, called IITP-VDLand, sourced from diverse platforms such as Decentraland, OpenSea, Etherscan, Google BigQuery, and various Social Media Platforms. Unlike existing datasets which have limited attributes and records, IITP-VDLand offers a rich array of attributes, encompassing parcel characteristics, trading history, past activities, transactions, and social media interactions. Alongside, we introduce a key attribute in the dataset, namely Rarity score, which measures the uniqueness of each parcel within the virtual world. Addressing the significant challenge posed by the dispersed nature of this data across various sources, we employ a systematic approach, utilizing both available APIs and custom scripts, to gather it. Subsequently, we meticulously curate and organize the information into four distinct fragments: (1) Characteristics, (2) OpenSea Trading History, (3) Ethereum Activity Transactions, and (4) Social Media. We envisage that this dataset would serve as a robust resource for training machine- and deep-learning models specifically designed to address real-world challenges within the domain of Decentraland parcels. The performance benchmarking of more than 20 state-of-the-art price prediction models on our dataset yields promising results, achieving a maximum R2 score of 0.8251 and an accuracy of 74.23% in case of Extra Trees Regressor and Classifier. The key findings reveal that the ensemble models perform better than both deep learning and linear models for our dataset. We observe a significant impact of coordinates, geographical proximity, rarity score, and few other economic indicators on the prediction of parcel prices.