Partitioning Strategies for Parallel Computation of Flexible Skylines

📅 2025-01-07

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

To address the low efficiency of flexible skyline queries on large-scale data and the challenge of simultaneously accommodating user-specified attribute weight preferences and computational scalability, this paper proposes a two-stage parallel computing framework. In the first stage, a divide-and-conquer multi-dimensional partitioning strategy tightly integrated with PySpark enables efficient dominance relationship pruning under arbitrary weight constraints. The second stage introduces a lightweight sequential optimization; notably, we further devise a fully parallel variant that eliminates this stage entirely. Our key innovation lies in the organic integration of attribute weight modeling, pre-filtering-based pruning, and distributed execution. Extensive experiments on multi-scale, high-dimensional datasets demonstrate that the fully parallel variant achieves a 3.2× speedup over state-of-the-art baselines, while the pre-filtering mechanism reduces end-to-end query latency by 47%.

Technology Category

Application Category

📝 Abstract

While classical skyline queries identify interesting data within large datasets, flexible skylines introduce preferences through constraints on attribute weights, and further reduce the data returned. However, computing these queries can be time-consuming for large datasets. We propose and implement a parallel computation scheme consisting of a parallel phase followed by a sequential phase, and apply it to flexible skylines. We assess the additional effect of an initial filtering phase to reduce dataset size before parallel processing, and the elimination of the sequential part (the most time-consuming) altogether. All our experiments are executed in the PySpark framework for a number of different datasets of varying sizes and dimensions.

Problem

Research questions and friction points this paper is trying to address.

Big Data

Skyline Query Efficiency

Parallel Processing Strategy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel Strategy

Distributed Computing Framework

Data Pre-screening

🔎 Similar Papers

No similar papers found.