Scavenger+: Revisiting Space-Time Tradeoffs in Key-Value Separated LSM-trees

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

KV-separated LSM trees alleviate write amplification but incur severe space amplification—particularly detrimental in cost-sensitive deployments. Existing garbage collection (GC) strategies neither adapt to workload characteristics nor account for the non-negligible space overhead of index LSM trees. To optimize the space–time trade-off, this paper proposes Scavenger+, featuring three key innovations: (1) an I/O-efficient garbage collection mechanism; (2) a lightweight, compensation-size-aware space-merging strategy; and (3) a dynamic GC scheduler that adapts to workload fluctuations. Experimental evaluation demonstrates that, compared to BlobDB, Titan, and TerarkDB, Scavenger+ achieves up to 37% lower space amplification while sustaining high write throughput—significantly improving storage efficiency and overall system performance.

Technology Category

Application Category

📝 Abstract

Key-Value Stores (KVS) based on log-structured merge-trees (LSM-trees) are widely used in storage systems but face significant challenges, such as high write amplification caused by compaction. KV-separated LSM-trees address write amplification but introduce significant space amplification, a critical concern in cost-sensitive scenarios. Garbage collection (GC) can reduce space amplification, but existing strategies are often inefficient and fail to account for workload characteristics. Moreover, current key-value (KV) separated LSM-trees overlook the space amplification caused by the index LSM-tree. In this paper, we systematically analyze the sources of space amplification in KV-separated LSM-trees and propose Scavenger+, which achieves a better performance-space trade-off. Scavenger+ introduces (1) an I/O-efficient garbage collection scheme to reduce I/O overhead, (2) a space-aware compaction strategy based on compensated size to mitigate index-induced space amplification, and (3) a dynamic GC scheduler that adapts to system load to make better use of CPU and storage resources. Extensive experiments demonstrate that Scavenger+ significantly improves write performance and reduces space amplification compared to state-of-the-art KV-separated LSM-trees, including BlobDB, Titan, and TerarkDB.

Problem

Research questions and friction points this paper is trying to address.

Reducing space amplification in KV-separated LSM-trees

Improving garbage collection efficiency for storage systems

Addressing index-induced space amplification in LSM-trees

Innovation

Methods, ideas, or system contributions that make the work stand out.

I/O-efficient garbage collection scheme

Space-aware compaction with compensated size

Dynamic GC scheduler adapting to system load

🔎 Similar Papers

LearnedKV: Integrating LSM and Learned Index for Superior Performance on Storage