Bala-Join: An Adaptive Hash Join for Balancing Communication and Computation in Geo-Distributed SQL Databases

📅 2026-03-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the performance bottleneck in distributed hash joins over wide-area networks caused by data skew, which leads to severe imbalance in computation and communication loads. To tackle this challenge, the authors propose Bala-Join, a novel approach that dynamically balances join workloads across geographically distributed SQL databases through an adaptive redistribution strategy. The core contributions include the Balanced Partitioning with Partial Replication (BPPR) algorithm, a distributed online skew-key detector, and the ASAP synchronization mechanism that integrates multicast-based redistribution, proactive signaling, and asynchronous pull. Experimental evaluation on real-world WAN deployments demonstrates that Bala-Join improves throughput by 25%–61% compared to state-of-the-art baselines while significantly reducing communication overhead and tail latency.

Technology Category

Application Category

📝 Abstract
Shared-nothing geo-distributed SQL databases, such as CockroachDB, are increasingly vital for enterprise applications requiring data resilience and locality. However, we encountered significant performance degradation at the customer side, especially when their deployments span multiple data centers over a Wide Area Network (WAN). Our investigation identifies the bottleneck in the performance of the Distributed Hash Join (Dist-HJ) algorithm, which is contingent upon a crucial balance between communication overhead and computational load. This balance is severely disrupted when processing skewed data from real-world customer workloads, leading to the observed performance decline. To tackle this challenge, we introduce Bala-Join, an adaptive solution to balance the computation and network load in Dist-HJ execution. Our approach consists of the Balanced Partition and Partial Replication (BPPR) algorithm and a distributed online skewed join key detector. The former achieves balanced redistribution of skewed data through a multicast mechanism to improve computational performance and reduce network overhead. The latter provides real-time skewed join key information tailored to BPPR. Furthermore, an Active-Signaling and Asynchronous-Pulling (ASAP) mechanism is incorporated to enable efficient, real-time synchronization between the detector and the redistribution process with minimal overhead. Empirical study shows that Bala-Join outperforms the popular Dist-HJ solutions, increasing throughput by 25%-61%.
Problem

Research questions and friction points this paper is trying to address.

geo-distributed databases
skewed data
distributed hash join
communication-computation balance
performance degradation
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive hash join
data skew
geo-distributed databases
balanced partitioning
real-time skew detection
🔎 Similar Papers
No similar papers found.
W
Wenlong Song
School of Cyber Engineering, Xidian University
Hui Li
Hui Li
Professor, School of Computer Science and Technology, Xidian University
DatabaseData MiningSocial NetworkData Security and Privacy
B
Bingying Zhai
School of Computer Science and Technology, Xidian University
J
Jinxin Yang
School of Cyber Engineering, Xidian University
Pinghui Wang
Pinghui Wang
Xi'an Jiaotong University
L
Luming Sun
Yunxi Technology Company Ltd.
M
Ming Li
Shandong Inspur Database Technology Company Ltd.
J
Jiangtao Cui
School of Computer Science and Technology, Xidian University