🤖 AI Summary
In high-dimensional data dimensionality reduction, the initial similarity graph is unreliable due to the “curse of dimensionality” and information sparsity, impeding cluster separation—especially as dataset size increases. To address this, we propose LocalMAP, a novel algorithm that introduces a dynamic local subgraph extraction and online update mechanism. LocalMAP achieves fine-grained, adaptive refinement of the adjacency graph via embedding-driven subgraph sampling, local neighborhood-aware adaptive reweighting, and iterative graph optimization. Compared with conventional methods (e.g., t-SNE, UMAP), LocalMAP significantly improves clustering structure recovery accuracy. On large-scale transcriptomic datasets, it successfully disentangles biologically meaningful but previously confounded subpopulations, accurately identifying critical cell types that were either missed or erroneously merged in prior analyses. LocalMAP thus establishes a new paradigm for interpretable, scalable dimensionality reduction of high-dimensional biological data.
📝 Abstract
Dimension reduction (DR) algorithms have proven to be extremely useful for gaining insight into large-scale high-dimensional datasets, particularly finding clusters in transcriptomic data. The initial phase of these DR methods often involves converting the original high-dimensional data into a graph. In this graph, each edge represents the similarity or dissimilarity between pairs of data points. However, this graph is frequently suboptimal due to unreliable high-dimensional distances and the limited information extracted from the high-dimensional data. This problem is exacerbated as the dataset size increases. If we reduce the size of the dataset by selecting points for a specific sections of the embeddings, the clusters observed through DR are more separable since the extracted subgraphs are more reliable. In this paper, we introduce LocalMAP, a new dimensionality reduction algorithm that dynamically and locally adjusts the graph to address this challenge. By dynamically extracting subgraphs and updating the graph on-the-fly, LocalMAP is capable of identifying and separating real clusters within the data that other DR methods may overlook or combine. We demonstrate the benefits of LocalMAP through a case study on biological datasets, highlighting its utility in helping users more accurately identify clusters for real-world problems.