🤖 AI Summary
This paper resolves the existence problem of *fully non-clustered* Burrows–Wheeler transforms (BWTs) over non-binary alphabets: for any length (n > 0) and alphabet size (k geq 3), it explicitly constructs a necklace of length (n) whose BWT contains exactly (n) runs—i.e., no two adjacent symbols are identical—achieving the maximum possible number of runs and thus realizing the theoretical worst-case clustering behavior. This is the first proof that fully non-clustered BWTs exist for *all* lengths in the non-binary setting, and it provides an explicit lower bound on the number of such strings. The method integrates combinatorial linguistics, cyclic string theory, and structural analysis of BWT runs, leveraging necklace construction and symbol-alternation properties to derive necessary and sufficient conditions. In contrast to the binary case—where existence remains conditional on the unproven Artin’s conjecture—this result fills a fundamental gap in the extremal theory of BWT structures.
📝 Abstract
We prove that for every integer $n > 0$ and for every alphabet $Σ_k$ of size $k geq 3$, there exists a necklace of length $n$ whose Burrows-Wheeler Transform (BWT) is completely unclustered, i.e., it consists of exactly $n$ runs with no two consecutive equal symbols. These words represent the worst-case behavior of the BWT for clustering, since the number of BWT runs is maximized. We also establish a lower bound on their number. This contrasts with the binary case, where the existence of infinitely many completely unclustered BWTs is still an open problem, related to Artin's conjecture on primitive roots.