FAMST: Fast Approximate Minimum Spanning Tree Construction for Large-Scale and High-Dimensional Data

📅 2025-07-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the computational bottleneck of constructing minimum spanning trees (MSTs) on large-scale, high-dimensional data, this paper proposes a three-stage approximation algorithm: (1) constructing an approximate nearest neighbor graph, (2) establishing initial inter-component connections across disconnected components, and (3) iteratively refining the edge set. The method integrates approximate nearest neighbor search, graph connectivity analysis, and edge-set optimization. It achieves a time complexity of $O(dn log n)$ and space complexity of $O(dn + kn)$, where $n$ is the number of points, $d$ the dimensionality, and $k$ the average neighborhood size. Empirical evaluation on million-point cloud datasets with thousand-dimensional features demonstrates controlled approximation error and up to 1000× speedup over exact MST algorithms. This significantly extends the practical applicability of MSTs to ultra-large-scale, high-dimensional settings.

Technology Category

Application Category

📝 Abstract
We present Fast Approximate Minimum Spanning Tree (FAMST), a novel algorithm that addresses the computational challenges of constructing Minimum Spanning Trees (MSTs) for large-scale and high-dimensional datasets. FAMST utilizes a three-phase approach: Approximate Nearest Neighbor (ANN) graph construction, ANN inter-component connection, and iterative edge refinement. For a dataset of $n$ points in a $d$-dimensional space, FAMST achieves $mathcal{O}(dn log n)$ time complexity and $mathcal{O}(dn + kn)$ space complexity when $k$ nearest neighbors are considered, which is a significant improvement over the $mathcal{O}(n^2)$ time and space complexity of traditional methods. Experiments across diverse datasets demonstrate that FAMST achieves remarkably low approximation errors while providing speedups of up to 1000$ imes$ compared to exact MST algorithms. We analyze how the key hyperparameters, $k$ (neighborhood size) and $λ$ (inter-component edges), affect performance, providing practical guidelines for hyperparameter selection. FAMST enables MST-based analysis on datasets with millions of points and thousands of dimensions, extending the applicability of MST techniques to problem scales previously considered infeasible.
Problem

Research questions and friction points this paper is trying to address.

Efficient MST construction for large-scale datasets
Reducing time and space complexity in high dimensions
Enabling MST analysis on previously infeasible scales
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses Approximate Nearest Neighbor graph construction
Implements ANN inter-component connection phase
Applies iterative edge refinement for accuracy
🔎 Similar Papers