Partitioning Strategies for Parallel Computation of Flexible Skylines

📅 2025-01-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency of flexible skyline queries on large-scale data and the challenge of simultaneously accommodating user-specified attribute weight preferences and computational scalability, this paper proposes a two-stage parallel computing framework. In the first stage, a divide-and-conquer multi-dimensional partitioning strategy tightly integrated with PySpark enables efficient dominance relationship pruning under arbitrary weight constraints. The second stage introduces a lightweight sequential optimization; notably, we further devise a fully parallel variant that eliminates this stage entirely. Our key innovation lies in the organic integration of attribute weight modeling, pre-filtering-based pruning, and distributed execution. Extensive experiments on multi-scale, high-dimensional datasets demonstrate that the fully parallel variant achieves a 3.2× speedup over state-of-the-art baselines, while the pre-filtering mechanism reduces end-to-end query latency by 47%.

Technology Category

Application Category

📝 Abstract
While classical skyline queries identify interesting data within large datasets, flexible skylines introduce preferences through constraints on attribute weights, and further reduce the data returned. However, computing these queries can be time-consuming for large datasets. We propose and implement a parallel computation scheme consisting of a parallel phase followed by a sequential phase, and apply it to flexible skylines. We assess the additional effect of an initial filtering phase to reduce dataset size before parallel processing, and the elimination of the sequential part (the most time-consuming) altogether. All our experiments are executed in the PySpark framework for a number of different datasets of varying sizes and dimensions.
Problem

Research questions and friction points this paper is trying to address.

Big Data
Skyline Query Efficiency
Parallel Processing Strategy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parallel Strategy
Distributed Computing Framework
Data Pre-screening
🔎 Similar Papers
No similar papers found.
E
Emilio De Lorenzis
Politecnico di Milano, DEIB
Davide Martinenghi
Davide Martinenghi
Politecnico di Milano
DatabasesLogicRanking