🤖 AI Summary
This paper addresses the problem of recovering a one-dimensional total order from noisy, local pairwise comparison matrices under limited query budgets—motivated by large-scale sequence reconstruction tasks such as electron microscopy image stacks exhibiting inherent locality. We propose the first theoretically guaranteed, sparse-graph-based reconstruction algorithm with time complexity O(N(log N + K)), which exactly recovers the global order using only ≈2N/3 true adjacency edges and supports parallelization. Our method follows a five-stage pipeline: Boruvka-based random hooking, iterative compression, dual-scan BFS for initial ordering, window-wise densification, and SuperChain-based greedy final ordering—compatible with both binary and bounded-noise distance oracles. On wafer-scale electron microscopy data, our approach significantly outperforms spectral ordering, MST-based, and TSP-based baselines, achieving higher sorting accuracy with fewer comparisons—demonstrating both strong theoretical guarantees and practical robustness.
📝 Abstract
We study recovering a 1D order from a noisy, locally sampled pairwise comparison matrix under a tight query budget. We recast the task as reconstructing a sparse, noisy line graph and present, to our knowledge, the first method that provably builds a sparse graph containing all edges needed for exact seriation using only O(N(log N + K)) oracle queries, which is near-linear in N for fixed window K. The approach is parallelizable and supports both binary and bounded-noise distance oracles. Our five-stage pipeline consists of: (i) a random-hook Boruvka step to connect components via short-range edges in O(N log N) queries; (ii) iterative condensation to bound graph diameter; (iii) a double-sweep BFS to obtain a provisional global order; (iv) fixed-window densification around that order; and (v) a greedy SuperChain that assembles the final permutation. Under a simple top-1 margin and bounded relative noise we prove exact recovery; empirically, SuperChain still succeeds when only about 2N/3 of true adjacencies are present. On wafer-scale serial-section EM, our method outperforms spectral, MST, and TSP baselines with far fewer comparisons, and is applicable to other locally structured sequencing tasks such as temporal snapshot ordering, archaeological seriation, and playlist/tour construction.