PolarStore: High-Performance Data Compression for Large-Scale Cloud-Native Databases

πŸ“… 2025-11-25
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Cloud-native RDBMSs face a fundamental trade-off in compression: software-based approaches incur substantial performance overhead, while hardware-accelerated solutions suffer from poor adaptability to diverse database workloads. To resolve this tension, this paper proposes a synergistic software-hardware co-design: a two-layer compression storage system. Its core innovations include tightly integrating PolarCSD’s in-storage compression hardware with lightweight, database-aware software compression; optimizing I/O critical paths; enhancing hardware reliability; and introducing compression-aware resource scheduling. Deployed across a production PolarDB cluster comprising thousands of nodes managing over 100 PB of data, the system achieves an average compression ratio of 3.55Γ—, reducing storage costs by 60%. Crucially, it sustains query and transaction performance comparable to uncompressed deployments. This demonstrates significant improvements in spatial efficiency and cost-performance ratio for large-scale cloud databases.

Technology Category

Application Category

πŸ“ Abstract
In recent years, resource elasticity and cost optimization have become essential for RDBMSs. While cloud-native RDBMSs provide elastic computing resources via disaggregated computing and storage, storage costs remain a critical user concern. Consequently, data compression emerges as an effective strategy to reduce storage costs. However, existing compression approaches in RDBMSs present a stark trade-off: software-based approaches incur significant performance overheads, while hardware-based alternatives lack the flexibility required for diverse database workloads. In this paper, we present PolarStore, a compressed shared storage system for cloud-native RDBMSs. PolarStore employs a dual-layer compression mechanism that combines in-storage compression in PolarCSD hardware with lightweight compression in software. This design leverages the strengths of both approaches. PolarStore also incorporates database-oriented optimizations to maintain high performance on critical I/O paths. Drawing from large-scale deployment experiences, we also introduce hardware improvements for PolarCSD to ensure host-level stability and propose a compression-aware scheduling scheme to improve cluster-level space efficiency. PolarStore is currently deployed on thousands of storage servers within PolarDB, managing over 100 PB of data. It achieves a compression ratio of 3.55 and reduces storage costs by approximately 60%. Remarkably, these savings are achieved while maintaining performance comparable to uncompressed clusters.
Problem

Research questions and friction points this paper is trying to address.

Reducing storage costs in cloud-native databases through compression
Overcoming performance-flexibility trade-off in existing compression methods
Achieving high compression ratios without compromising system performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dual-layer compression combining hardware and software
Database-oriented optimizations for high I/O performance
Compression-aware scheduling for cluster-level space efficiency
Q
Qingda Hu
Alibaba Cloud Computing
X
Xinjun Yang
Alibaba Cloud Computing
F
Feifei Li
Alibaba Cloud Computing
Junru Li
Junru Li
Alibaba Cloud Computing
Y
Ya Lin
Alibaba Cloud Computing
Y
Yuqi Zhou
Alibaba Cloud Computing
Y
Yicong Zhu
Alibaba Cloud Computing
J
Junwei Zhang
Alibaba Cloud Computing
R
Rongbiao Xie
Alibaba Cloud Computing
L
Ling Zhou
Alibaba Cloud Computing
B
Bin Wu
Alibaba Cloud Computing
Wenchao Zhou
Wenchao Zhou
Georgetown University
DatabasesNetworkingSystemsSecurity