Sketched Sum-Product Networks for Joins

📅 2025-06-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing cardinality estimation methods for multi-table joins struggle to generalize to unseen queries due to reliance on predefined query templates. Method: This paper proposes SPN-Sketch, the first framework integrating Sum-Product Networks (SPNs) with sketching techniques—leveraging SPNs’ ability to decompose and model high-dimensional joint distributions for online, real-time sketch generation under arbitrary new query conditions. Contribution/Results: SPN-Sketch eliminates template dependency, enabling fine-grained compositional modeling and dynamic predicate inference at the attribute level. It achieves high estimation accuracy while drastically reducing sketch construction overhead. Experiments demonstrate strong generalization and scalability in query cost estimation, offering a plug-and-play, cost-driven cardinality estimator for query optimization.

Technology Category

Application Category

📝 Abstract
Sketches have shown high accuracy in multi-way join cardinality estimation, a critical problem in cost-based query optimization. Accurately estimating the cardinality of a join operation -- analogous to its computational cost -- allows the optimization of query execution costs in relational database systems. However, although sketches have shown high efficacy in query optimization, they are typically constructed specifically for predefined selections in queries that are assumed to be given a priori, hindering their applicability to new queries. As a more general solution, we propose for Sum-Product Networks to dynamically approximate sketches on-the-fly. Sum-Product Networks can decompose and model multivariate distributions, such as relations, as linear combinations of multiple univariate distributions. By representing these univariate distributions as sketches, Sum-Product Networks can combine them element-wise to efficiently approximate the sketch of any query selection. These approximate sketches can then be applied to join cardinality estimation. In particular, we implement the Fast-AGMS and Bound Sketch methods, which have successfully been used in prior work, despite their costly construction. By accurately approximating them instead, our work provides a practical alternative to apply these sketches to query optimization.
Problem

Research questions and friction points this paper is trying to address.

Estimating join cardinality for query optimization
Dynamic sketch approximation for new queries
Efficiently combining univariate distributions as sketches
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic sketch approximation via Sum-Product Networks
Element-wise combination of univariate sketch distributions
Fast-AGMS and Bound Sketch approximation for joins
🔎 Similar Papers
No similar papers found.
B
Brian Tsan
University of California Merced
A
Abylay Amanbayev
University of California Merced
A
Asoke Datta
University of California Merced
Florin Rusu
Florin Rusu
Department of Computer Science and Engineering, UC Merced
DatabasesApproximate Query ProcessingScalable Machine Learning