SOLAR: Scalable Distributed Spatial Joins through Learning-based Optimization

📅 2025-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Redundant computation in distributed spatial joins arises from repeated data partitioning across queries. Method: This paper introduces the first learning-based optimization framework for distributed spatial joins, which models spatial data characteristics and employs supervised learning to dynamically assess partition similarity under historical query workloads—enabling adaptive reuse or reconstruction of partitioning strategies. It pioneers data-similarity-aware partitioning by integrating adaptive partitioning decisions with parallel processing. Contribution/Results: Experiments on real-world datasets demonstrate that our approach reduces end-to-end join latency by up to 72.2% (3.6× speedup) and accelerates the partitioning phase by 2.71× over state-of-the-art systems, significantly improving scalability and efficiency for large-scale spatial joins.

Technology Category

Application Category

📝 Abstract
The proliferation of location-based services has led to massive spatial data generation. Spatial join is a crucial database operation that identifies pairs of objects from two spatial datasets based on spatial relationships. Due to the intensive computational demands, spatial joins are often executed in a distributed manner across clusters. However, current systems fail to recognize similarities in the partitioning of spatial data, leading to redundant computations and increased overhead. Recently, incorporating machine learning optimizations into database operations has enhanced efficiency in traditional joins by predicting optimal strategies. However, applying these optimizations to spatial joins poses challenges due to the complex nature of spatial relationships and the variability of spatial data. This paper introduces SOLAR, scalable distributed spatial joins through learning-based optimization. SOLAR operates through offline and online phases. In the offline phase, it learns balanced spatial partitioning based on the similarities between datasets in query workloads seen so far. In the online phase, when a new join query is received, SOLAR evaluates the similarity between the datasets in the new query and the already-seen workloads using the trained learning model. Then, it decides to either reuse an existing partitioner, avoiding unnecessary computational overhead, or partition from scratch. Our extensive experimental evaluation on real-world datasets demonstrates that SOLAR achieves up to 3.6X speedup in overall join runtime and 2.71X speedup in partitioning time compared to state-of-the-art systems.
Problem

Research questions and friction points this paper is trying to address.

Optimizes distributed spatial joins using learning-based methods
Reduces redundant computations by reusing balanced spatial partitions
Improves efficiency in handling complex spatial data relationships
Innovation

Methods, ideas, or system contributions that make the work stand out.

Learning-based balanced spatial partitioning optimization
Reuse existing partitioners to reduce overhead
Offline and online phases for scalable joins
🔎 Similar Papers
No similar papers found.
Y
Yongyi Liu
University of California, Riverside
A
Ahmed Mahmood
Google LLC.
Amr Magdy
Amr Magdy
University of California, Riverside
Data managementspatial data managementGISlarge-scale data analyticsindexing
M
Minyao Zhu
Google LLC.