A Unified Approach for Multi-granularity Search over Spatial Datasets

📅 2024-12-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Existing spatial data search systems suffer from a disconnection between dataset-level (coarse-grained) and data-point-level (fine-grained) retrieval, coupled with misaligned indexing and optimization mechanisms. Method: This paper proposes Spadas, a multi-granularity unified search system featuring a novel unified spatial index structure that supports cross-granularity joint queries while jointly optimizing high-dimensional indexing overhead and outlier handling. It introduces, for the first time, approximate boundary computation with guaranteed error bounds and batch-wise pruning to enable cross-granularity joint optimization. The system supports multiple distance metrics and efficient approximate querying. Contribution/Results: Evaluated on six real-world spatial data warehouses, Spadas achieves 1–3 orders-of-magnitude speedup over state-of-the-art methods. It has been deployed as a publicly accessible online service, and its practicality and scalability are validated through representative application scenarios.

Technology Category

Application Category

📝 Abstract

There has been increased interest in data search as a means to find relevant datasets or data points in data lakes and repositories. Although approaches have been proposed to support spatial dataset search and data point search, they consider the two types of searches independently. To enable search operations ranging from the coarse-grained dataset level to the fine-grained data point level, we provide an integrated one that supports diverse query types and distance metrics. In this paper, we focus on designing a multi-granularity spatial data search system, called Spadas, that supports both dataset and data point search operations. To address the challenges of the high cost of indexing and susceptibility to outliers, we propose a unified index that can drastically improve query efficiency in various scenarios by organizing data reasonably and removing outliers in datasets. Moreover, to accelerate all data search operations, we propose a set of pruning mechanisms based on the unified index, including fast bound estimation, approximation technique with error bound, and pruning in batch techniques, to effectively filter out non-relevant datasets and points. Finally, we report the results of a detailed experimental evaluation using six spatial data repositories, achieving orders of magnitude faster than the state-of-the-art algorithms and demonstrating the effectiveness by case study. An online spatial data search system of Spadas is also implemented and made accessible to users.

Problem

Research questions and friction points this paper is trying to address.

Unified approach for multi-granularity spatial data search

Integrated system supporting dataset and data point queries

Efficient indexing and pruning to improve search performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified index for multi-granularity spatial search

Pruning mechanisms to accelerate search operations

Integrated support for diverse query types

🔎 Similar Papers

No similar papers found.

Authors to Follow