Unclustered BWTs of any Length over Non-Binary Alphabets

📅 2025-08-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper resolves the existence problem of *fully non-clustered* Burrows–Wheeler transforms (BWTs) over non-binary alphabets: for any length (n > 0) and alphabet size (k geq 3), it explicitly constructs a necklace of length (n) whose BWT contains exactly (n) runs—i.e., no two adjacent symbols are identical—achieving the maximum possible number of runs and thus realizing the theoretical worst-case clustering behavior. This is the first proof that fully non-clustered BWTs exist for *all* lengths in the non-binary setting, and it provides an explicit lower bound on the number of such strings. The method integrates combinatorial linguistics, cyclic string theory, and structural analysis of BWT runs, leveraging necklace construction and symbol-alternation properties to derive necessary and sufficient conditions. In contrast to the binary case—where existence remains conditional on the unproven Artin’s conjecture—this result fills a fundamental gap in the extremal theory of BWT structures.

Technology Category

Application Category

📝 Abstract
We prove that for every integer $n > 0$ and for every alphabet $Σ_k$ of size $k geq 3$, there exists a necklace of length $n$ whose Burrows-Wheeler Transform (BWT) is completely unclustered, i.e., it consists of exactly $n$ runs with no two consecutive equal symbols. These words represent the worst-case behavior of the BWT for clustering, since the number of BWT runs is maximized. We also establish a lower bound on their number. This contrasts with the binary case, where the existence of infinitely many completely unclustered BWTs is still an open problem, related to Artin's conjecture on primitive roots.
Problem

Research questions and friction points this paper is trying to address.

Proving existence of unclustered BWTs for non-binary alphabets
Establishing worst-case BWT clustering behavior with maximum runs
Contrasting non-binary results with open binary case problems
Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-binary alphabet BWT construction
Maximized runs with no clustering
Lower bound established for runs
🔎 Similar Papers
No similar papers found.