Scavenger+: Revisiting Space-Time Tradeoffs in Key-Value Separated LSM-trees

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
KV-separated LSM trees alleviate write amplification but incur severe space amplification—particularly detrimental in cost-sensitive deployments. Existing garbage collection (GC) strategies neither adapt to workload characteristics nor account for the non-negligible space overhead of index LSM trees. To optimize the space–time trade-off, this paper proposes Scavenger+, featuring three key innovations: (1) an I/O-efficient garbage collection mechanism; (2) a lightweight, compensation-size-aware space-merging strategy; and (3) a dynamic GC scheduler that adapts to workload fluctuations. Experimental evaluation demonstrates that, compared to BlobDB, Titan, and TerarkDB, Scavenger+ achieves up to 37% lower space amplification while sustaining high write throughput—significantly improving storage efficiency and overall system performance.

Technology Category

Application Category

📝 Abstract
Key-Value Stores (KVS) based on log-structured merge-trees (LSM-trees) are widely used in storage systems but face significant challenges, such as high write amplification caused by compaction. KV-separated LSM-trees address write amplification but introduce significant space amplification, a critical concern in cost-sensitive scenarios. Garbage collection (GC) can reduce space amplification, but existing strategies are often inefficient and fail to account for workload characteristics. Moreover, current key-value (KV) separated LSM-trees overlook the space amplification caused by the index LSM-tree. In this paper, we systematically analyze the sources of space amplification in KV-separated LSM-trees and propose Scavenger+, which achieves a better performance-space trade-off. Scavenger+ introduces (1) an I/O-efficient garbage collection scheme to reduce I/O overhead, (2) a space-aware compaction strategy based on compensated size to mitigate index-induced space amplification, and (3) a dynamic GC scheduler that adapts to system load to make better use of CPU and storage resources. Extensive experiments demonstrate that Scavenger+ significantly improves write performance and reduces space amplification compared to state-of-the-art KV-separated LSM-trees, including BlobDB, Titan, and TerarkDB.
Problem

Research questions and friction points this paper is trying to address.

Reducing space amplification in KV-separated LSM-trees
Improving garbage collection efficiency for storage systems
Addressing index-induced space amplification in LSM-trees
Innovation

Methods, ideas, or system contributions that make the work stand out.

I/O-efficient garbage collection scheme
Space-aware compaction with compensated size
Dynamic GC scheduler adapting to system load
J
Jianshun Zhang
Wuhan National Laboratory for Optoelectronic, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
Fang Wang
Fang Wang
Postdoc, Stanford University
Reading acquisitiondyslexiacross-linguistic researchbilingualismcognitive neuroscience
J
Jiaxin Ou
ByteDance
Y
Yi Wang
ByteDance
M
Ming Zhao
ByteDance
S
Sheng Qiu
ByteDance
J
Junxun Huang
Wuhan National Laboratory for Optoelectronic, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
Baoquan Li
Baoquan Li
Wuhan National Laboratory for Optoelectronic, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China
Peng Fang
Peng Fang
Huazhong University of Science and Technology
Heterogeneous ArchitectureGraph LearningBig Data Analysis
D
Dan Feng
Wuhan National Laboratory for Optoelectronic, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, Hubei 430074, China