Buffered Partially-Persistent External-Memory Search Trees

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the I/O-efficiency challenge of partially persistent search trees in external memory. We present the first I/O-optimal solution supporting updates on the current version and queries on any historical version, with all operation costs depending solely on the size (N_v) of the accessed version. Our approach innovatively integrates buffering techniques with partial persistence within the B(^{varepsilon})-tree framework, introducing a timestamped hierarchical persistent index and dynamic buffer management. This yields amortized I/O optimality, upgradeable to worst-case guarantees. Insertions and deletions incur (Oig(frac{1}{varepsilon B^{1-varepsilon}} log_B N_vig)) I/Os; successor and range queries require (Oig(frac{1}{varepsilon} log_B N_v + K/Big)) I/Os, where (K) is the output size. The data structure uses linear space—significantly improving upon prior work such as the JEA 2003 solution.

Technology Category

Application Category

📝 Abstract
We present an optimal partially-persistent external-memory search tree with amortized I/O bounds matching those achieved by the non-persistent $B^{varepsilon}$-tree by Brodal and Fagerberg [SODA 2003]. In a partially-persistent data structure each update creates a new version of the data structure, where all past versions can be queried, but only the current version can be updated. All operations should be efficient with respect to the size $N_v$ of the accessed version $v$. For any parameter $0<varepsilon<1$, our data structure supports insertions and deletions in amortized $O!left(frac{1}{varepsilon B^{1-varepsilon}}log_B N_v ight)$ I/Os, where $B$ is the external-memory block size. It also supports successor and range reporting queries in amortized $O!left(frac{1}{varepsilon}log_B N_v+K/B ight)$ I/Os, where $K$ is the number of values reported. The space usage of the data structure is linear in the total number of updates. We make the standard and minimal assumption that the internal memory has size $M geq 2B$. The previous state-of-the-art external-memory partially-persistent search tree by Arge, Danner and Teh [JEA 2003] supports all operations in worst-case $O!left(log_B N_v+K/B ight)$ I/Os, matching the bounds achieved by the classical B-tree by Bayer and McCreight [Acta Informatica 1972]. Our data structure successfully combines buffering updates with partial persistence. The I/O bounds can also be achieved in the worst-case sense, by slightly modifying our data structure and under the requirement that the memory size $M = Omega!left(B^{1-varepsilon}log_2(max_v N_v) ight)$. The worst-case result slightly improves the memory requirement over the previous ephemeral external-memory dictionary by Das, Iacono, and Nekrich (ISAAC 2022), who achieved matching worst-case I/O bounds but required $M=Omega!left(Blog_B N ight)$.
Problem

Research questions and friction points this paper is trying to address.

Optimal partially-persistent external-memory search tree design.
Efficient updates and queries across multiple data versions.
Improved I/O bounds and memory requirements for persistent structures.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimal partially-persistent external-memory search tree
Amortized I/O bounds matching non-persistent B-tree
Combines buffering updates with partial persistence