🤖 AI Summary
This paper addresses the challenges of maintaining balance and achieving efficient query processing in binary search trees over high-dimensional and partial-order data. We propose learning-augmented skip lists and KD-trees, introducing the first unified theoretical framework for these two index structures. Our approach integrates machine learning predictions—such as query frequency—into their randomized construction; when prediction error is bounded, it guarantees near-optimal expected search time (within a constant factor) and strong robustness: even under arbitrarily inaccurate predictions, performance degrades gracefully to match that of classical non-learning variants. Theoretical analysis combines probabilistic modeling with rigorous error-robustness proofs. Extensive experiments on synthetic and real-world datasets demonstrate significant improvements over state-of-the-art baselines. Our core contribution is the first provably optimal and robust learning-augmented indexing theory, effectively balancing efficiency and stability for high-dimensional and partial-order data.
📝 Abstract
We study the integration of machine learning advice to improve upon traditional data structure designed for efficient search queries. Although there has been recent effort in improving the performance of binary search trees using machine learning advice, e.g., Lin et. al. (ICML 2022), the resulting constructions nevertheless suffer from inherent weaknesses of binary search trees, such as complexity of maintaining balance across multiple updates and the inability to handle partially-ordered or high-dimensional datasets. For these reasons, we focus on skip lists and KD trees in this work. Given access to a possibly erroneous oracle that outputs estimated fractional frequencies for search queries on a set of items, we construct skip lists and KD trees that provably provides the optimal expected search time, within nearly a factor of two. In fact, our learning-augmented skip lists and KD trees are still optimal up to a constant factor, even if the oracle is only accurate within a constant factor. We also demonstrate robustness by showing that our data structures achieves an expected search time that is within a constant factor of an oblivious skip list/KD tree construction even when the predictions are arbitrarily incorrect. Finally, we empirically show that our learning-augmented search data structures outperforms their corresponding traditional analogs on both synthetic and real-world datasets.