🤖 AI Summary
This paper addresses the spatial data procurement scenario by formalizing the Budgeted Maximum Coverage with Connectivity constraints (BMCC) problem: selecting a subset of spatial datasets under a budget to maximize geographic coverage area while ensuring topological connectivity of the covered region. We propose two greedy algorithms with provable approximation guarantees and polynomial-time complexity. To enhance efficiency, we introduce a dual acceleration strategy combining spatial indexing and graph-connectivity-based pruning. Extensive experiments on five real-world datasets demonstrate that our algorithms solve instances in milliseconds, achieve an average coverage ratio exceeding 92%—close to optimal—guarantee 100% connectivity satisfaction, and outperform baseline methods by up to 8.3× in runtime.
📝 Abstract
Data is undoubtedly becoming a commodity like oil, land, and labor in the 21st century. Although there have been many successful marketplaces for data trading, the existing data marketplaces lack consideration of the case where buyers want to acquire a collection of datasets (instead of one), and the overall spatial coverage and connectivity matter. In this paper, we take the first attempt to formulate this problem as Budgeted Maximum Coverage with Connectivity Constraint (BMCC), which aims to acquire a dataset collection with the maximum spatial coverage under a limited budget while maintaining spatial connectivity. To solve the problem, we propose two approximate algorithms with detailed theoretical guarantees and time complexity analysis, followed by two acceleration strategies to further improve the efficiency of the algorithm. Experiments are conducted on five real-world spatial dataset collections to verify the efficiency and effectiveness of our algorithms.