🤖 AI Summary
This paper addresses quantile estimation in distributed streaming data by proposing a mergeable quantile sketch based on an adaptive compressor. Compared to the state-of-the-art ReqSketch algorithm, our method retains the same space complexity—O(1/ε log(1/δ))—and O(1) update time, yet significantly simplifies the theoretical proof of its ε-approximation guarantee under full mergeability, eliminating intricate variance analysis and charging arguments. The key innovation is an adaptive compression mechanism that tightly controls error propagation during merging, yielding near-optimal space bounds and enhanced theoretical clarity. The method has been integrated into production systems such as Apache DataSketches and is applicable to resource-constrained environments including sensor networks and large-scale distributed systems.
📝 Abstract
Quantile summaries provide a scalable way to estimate the distribution of individual attributes in large datasets that are often distributed across multiple machines or generated by sensor networks. ReqSketch (arXiv:2004.01668) is currently the most space-efficient summary with two key properties: relative error guarantees, offering increasingly higher accuracy towards the distribution's tails, and mergeability, allowing distributed or parallel processing of datasets. Due to these features and its simple algorithm design, ReqSketch has been adopted in practice, via implementation in the Apache DataSketches library. However, the proof of mergeability in ReqSketch is overly complicated, requiring an intricate charging argument and complex variance analysis.
In this paper, we provide a refined version of ReqSketch, by developing so-called adaptive compactors. This enables a significantly simplified proof of relative error guarantees in the most general mergeability setting, while retaining the original space bound, update time, and algorithmic simplicity. Moreover, the adaptivity of our sketch, together with the proof technique, yields near-optimal space bounds in specific scenarios - particularly when merging sketches of comparable size.