Dynamic DBSCAN with Euler Tour Sequences

📅 2025-03-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the inefficiency and full recomputation overhead of DBSCAN in dynamic data streams, this paper proposes the first dynamic DBSCAN algorithm based on Euler Tour Trees (ETTs). It reformulates density-based clustering as a dynamic graph maintenance problem, enabling efficient local updates for single-point insertions or deletions via incremental neighborhood density updates and adaptive ε-neighborhood management. Theoretically, the algorithm achieves a time complexity of *O(d log³n + log⁴n)* per update—significantly outperforming static recomputation. Empirical evaluation demonstrates several-fold throughput improvement while maintaining or even improving clustering quality. This work pioneers the integration of ETTs into density-based clustering, providing near-optimal density estimation and real-time clustering support for large-scale evolving spatiotemporal data streams.

Technology Category

Application Category

📝 Abstract
We propose a fast and dynamic algorithm for Density-Based Spatial Clustering of Applications with Noise (DBSCAN) that efficiently supports online updates. Traditional DBSCAN algorithms, designed for batch processing, become computationally expensive when applied to dynamic datasets, particularly in large-scale applications where data continuously evolves. To address this challenge, our algorithm leverages the Euler Tour Trees data structure, enabling dynamic clustering updates without the need to reprocess the entire dataset. This approach preserves a near-optimal accuracy in density estimation, as achieved by the state-of-the-art static DBSCAN method (Esfandiari et al., 2021) Our method achieves an improved time complexity of $O(d log^3(n) + log^4(n))$ for every data point insertion and deletion, where $n$ and $d$ denote the total number of updates and the data dimension, respectively. Empirical studies also demonstrate significant speedups over conventional DBSCANs in real-time clustering of dynamic datasets, while maintaining comparable or superior clustering quality.
Problem

Research questions and friction points this paper is trying to address.

Dynamic DBSCAN for online updates in evolving datasets
Efficient clustering without reprocessing entire dataset
Improved time complexity for data point insertion/deletion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic DBSCAN with Euler Tour Trees
Efficient online updates for dynamic datasets
Improved time complexity for data updates
🔎 Similar Papers
S
Seiyun Shin
University of Illinois Urbana-Champaign
Ilan Shomorony
Ilan Shomorony
University of Illinois Urbana-Champaign
Information TheoryComputational BiologyMachine LearningCommunications
P
Peter Macgregor
University of St Andrews