DBSCAN in domains with periodic boundary conditions

📅 2025-01-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Clustering data under periodic boundary conditions (e.g., circles, tori, 3D periodic boxes) poses challenges for standard density-based algorithms like DBSCAN, which assume Euclidean geometry and fail to respect translational or cyclic symmetries. Method: We propose the first seamless periodic extension of DBSCAN, introducing a periodic distance metric and coordinate mapping scheme, coupled with a neighborhood overlap handling strategy. Crucially, it preserves DBSCAN’s core logic while remaining compatible with efficient spatial indexes (e.g., KDTree, BallTree), retaining an overall O(N log N) time complexity. Contribution/Results: This is the first method to natively model translational and cyclic symmetry within DBSCAN—without data replication or boundary reconstruction. We validate its robustness and improved clustering accuracy on synthetic datasets in 1D–3D and on real-world turbulent bubble trajectory data. The implementation is released as an open-source, plug-and-play Python package.

Technology Category

Application Category

📝 Abstract
Many scientific problems involve data that is embedded in a space with periodic boundary conditions. This can for instance be related to an inherent cyclic or rotational symmetry in the data or a spatially extended periodicity. When analyzing such data, well-tailored methods are needed to obtain efficient approaches that obey the periodic boundary conditions of the problem. In this work, we present a method for applying a clustering algorithm to data embedded in a periodic domain based on the DBSCAN algorithm, a widely used unsupervised machine learning method that identifies clusters in data. The proposed method internally leverages the conventional DBSCAN algorithm for domains with open boundaries, such that it remains compatible with all optimized implementations for neighborhood searches in open domains. In this way, it retains the same optimized runtime complexity of $O(Nlog N)$. We demonstrate the workings of the proposed method using synthetic data in one, two and three dimensions and also apply it to a real-world example involving the clustering of bubbles in a turbulent flow. The proposed approach is implemented in a ready-to-use Python package that we make publicly available.
Problem

Research questions and friction points this paper is trying to address.

DBSCAN
circular boundaries
data clustering
Innovation

Methods, ideas, or system contributions that make the work stand out.

DBSCAN Clustering
Periodic Boundary Conditions
Optimized Neighbor Search
🔎 Similar Papers
No similar papers found.