🤖 AI Summary
To address the inefficiency and full recomputation overhead of DBSCAN in dynamic data streams, this paper proposes the first dynamic DBSCAN algorithm based on Euler Tour Trees (ETTs). It reformulates density-based clustering as a dynamic graph maintenance problem, enabling efficient local updates for single-point insertions or deletions via incremental neighborhood density updates and adaptive ε-neighborhood management. Theoretically, the algorithm achieves a time complexity of *O(d log³n + log⁴n)* per update—significantly outperforming static recomputation. Empirical evaluation demonstrates several-fold throughput improvement while maintaining or even improving clustering quality. This work pioneers the integration of ETTs into density-based clustering, providing near-optimal density estimation and real-time clustering support for large-scale evolving spatiotemporal data streams.
📝 Abstract
We propose a fast and dynamic algorithm for Density-Based Spatial Clustering of Applications with Noise (DBSCAN) that efficiently supports online updates. Traditional DBSCAN algorithms, designed for batch processing, become computationally expensive when applied to dynamic datasets, particularly in large-scale applications where data continuously evolves. To address this challenge, our algorithm leverages the Euler Tour Trees data structure, enabling dynamic clustering updates without the need to reprocess the entire dataset. This approach preserves a near-optimal accuracy in density estimation, as achieved by the state-of-the-art static DBSCAN method (Esfandiari et al., 2021) Our method achieves an improved time complexity of $O(d log^3(n) + log^4(n))$ for every data point insertion and deletion, where $n$ and $d$ denote the total number of updates and the data dimension, respectively. Empirical studies also demonstrate significant speedups over conventional DBSCANs in real-time clustering of dynamic datasets, while maintaining comparable or superior clustering quality.