Bounding the Average Move Structure Query for Faster and Smaller RLBWT Permutations

๐Ÿ“… 2026-02-11
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
This work addresses the high space overhead and slow construction of Burrowsโ€“Wheeler Transform (BWT)-based permutation structures, such as the LF and ฯ† mappings, which scale with the number of runs \( r \). The authors propose a novel length-truncated splitting mechanism that departs from traditional interval-splitting strategies reliant on balancing heuristics. Without requiring worst-case guarantees, their approach achieves constant amortized query time for the first time and compresses the structure to \( O(r \log r + r \log(n/r)) \) bits. A linear-time construction algorithm is devised, operating in \( O(r) \) time and space, enabling efficient inverse BWT and suffix array enumeration. Experimental results on repetitive genomic datasets demonstrate approximately 40% reduction in space for the LF mapping, alongside faster construction and more efficient queries.

Technology Category

Application Category

๐Ÿ“ Abstract
The move structure represents permutations with long contiguously permuted intervals in compressed space with optimal query time. They have become an important feature of compressed text indexes using space proportional to the number of Burrows-Wheeler Transform (BWT) runs, often applied in genomics. This is in thanks not only to theoretical improvements over past approaches, but great cache efficiency and average case query time in practice. This is true even without using the worst case guarantees provided by the interval splitting balancing of the original result. In this paper, we show that an even simpler type of splitting, length capping by truncating long intervals, bounds the average move structure query time to optimal whilst obtaining a superior construction time than the traditional approach. This also proves constant query time when amortized over a full traversal of a single cycle permutation from an arbitrary starting position. Such a scheme has surprising benefits both in theory and practice. We leverage the approach to improve the representation of any move structure with $r$ runs over a domain $n$ to $O(r \log r + r \log \frac{n}{r})$-bits of space. The worst case query time is also improved to $O(\log \frac{n}{r})$ without balancing. An $O(r)$-time and $O(r)$-space construction lets us apply the method to run-length encoded BWT (RLBWT) permutations such as LF and $\phi$ to obtain optimal-time algorithms for BWT inversion and suffix array (SA) enumeration in $O(r)$ additional working space. Finally, we provide the RunPerm library, providing flexible plug and play move structure support, and use it to evaluate our splitting approach. Experiments find length capping results in faster move structures, but also a space reduction: at least $\sim 40\%$ for LF across large repetitive genomic collections.
Problem

Research questions and friction points this paper is trying to address.

move structure
RLBWT
compressed text index
BWT runs
space efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

move structure
run-length encoded BWT
length capping
compressed text indexing
LF mapping
๐Ÿ”Ž Similar Papers