Scavenger: Better Space-Time Trade-Offs for Key-Value Separated LSM-trees

📅 2025-08-19
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
KV-separated LSM-trees reduce write amplification but incur severe space amplification—particularly problematic in cost-sensitive deployments—while existing garbage collection (GC) policies fail to adapt to workload characteristics and neglect space bloat in the index LSM-tree itself. This paper proposes Scavenger: (1) the first systematic analysis of multi-source causes of space amplification in KV-separated architectures; (2) an I/O-efficient, workload-aware GC mechanism that dynamically adjusts compaction eligibility based on access patterns and value lifetime; and (3) a novel “compensation size” model to guide index-tree merging, enabling space-aware merge scheduling. Evaluated against BlobDB, Titan, and TerarkDB, Scavenger achieves significantly lower space amplification while sustaining high write throughput—thereby delivering superior trade-offs between storage efficiency and performance.

Technology Category

Application Category

📝 Abstract
Key-Value Stores (KVS) implemented with log-structured merge-tree (LSM-tree) have gained widespread acceptance in storage systems. Nonetheless, a significant challenge arises in the form of high write amplification due to the compaction process. While KV-separated LSM-trees successfully tackle this issue, they also bring about substantial space amplification problems, a concern that cannot be overlooked in cost-sensitive scenarios. Garbage collection (GC) holds significant promise for space amplification reduction, yet existing GC strategies often fall short in optimization performance, lacking thorough consideration of workload characteristics. Additionally, current KV-separated LSM-trees also ignore the adverse effect of the space amplification in the index LSM-tree. In this paper, we systematically analyze the sources of space amplification of KV-separated LSM-trees and introduce Scavenger, which achieves a better trade-off between performance and space amplification. Scavenger initially proposes an I/O-efficient garbage collection scheme to reduce I/O overhead and incorporates a space-aware compaction strategy based on compensated size to minimize the space amplification of index LSM-trees. Extensive experiments show that Scavenger significantly improves write performance and achieves lower space amplification than other KV-separated LSM-trees (including BlobDB, Titan, and TerarkDB).
Problem

Research questions and friction points this paper is trying to address.

Reducing space amplification in KV-separated LSM-trees
Optimizing garbage collection for workload characteristics
Minimizing index LSM-tree space amplification effects
Innovation

Methods, ideas, or system contributions that make the work stand out.

I/O-efficient garbage collection scheme
Space-aware compaction strategy
Compensated size optimization
🔎 Similar Papers
No similar papers found.
J
Jianshun Zhang
Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, China
Fang Wang
Fang Wang
Postdoc, Stanford University
Reading acquisitiondyslexiacross-linguistic researchbilingualismcognitive neuroscience
S
Sheng Qiu
ByteDance Inc.
Y
Yi Wang
ByteDance Inc.
J
Jiaxin Ou
ByteDance Inc.
J
Junxun Huang
Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, China
Baoquan Li
Baoquan Li
Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, China
Peng Fang
Peng Fang
Huazhong University of Science and Technology
Heterogeneous ArchitectureGraph LearningBig Data Analysis
D
Dan Feng
Wuhan National Laboratory for Optoelectronics, Key Laboratory of Information Storage System, Engineering Research Center of data storage systems and Technology, Ministry of Education of China, School of Computer Science and Technology, Huazhong University of Science and Technology, China