🤖 AI Summary
KV-separated LSM-trees reduce write amplification but incur severe space amplification—particularly problematic in cost-sensitive deployments—while existing garbage collection (GC) policies fail to adapt to workload characteristics and neglect space bloat in the index LSM-tree itself. This paper proposes Scavenger: (1) the first systematic analysis of multi-source causes of space amplification in KV-separated architectures; (2) an I/O-efficient, workload-aware GC mechanism that dynamically adjusts compaction eligibility based on access patterns and value lifetime; and (3) a novel “compensation size” model to guide index-tree merging, enabling space-aware merge scheduling. Evaluated against BlobDB, Titan, and TerarkDB, Scavenger achieves significantly lower space amplification while sustaining high write throughput—thereby delivering superior trade-offs between storage efficiency and performance.
📝 Abstract
Key-Value Stores (KVS) implemented with log-structured merge-tree (LSM-tree) have gained widespread acceptance in storage systems. Nonetheless, a significant challenge arises in the form of high write amplification due to the compaction process. While KV-separated LSM-trees successfully tackle this issue, they also bring about substantial space amplification problems, a concern that cannot be overlooked in cost-sensitive scenarios. Garbage collection (GC) holds significant promise for space amplification reduction, yet existing GC strategies often fall short in optimization performance, lacking thorough consideration of workload characteristics. Additionally, current KV-separated LSM-trees also ignore the adverse effect of the space amplification in the index LSM-tree. In this paper, we systematically analyze the sources of space amplification of KV-separated LSM-trees and introduce Scavenger, which achieves a better trade-off between performance and space amplification. Scavenger initially proposes an I/O-efficient garbage collection scheme to reduce I/O overhead and incorporates a space-aware compaction strategy based on compensated size to minimize the space amplification of index LSM-trees. Extensive experiments show that Scavenger significantly improves write performance and achieves lower space amplification than other KV-separated LSM-trees (including BlobDB, Titan, and TerarkDB).