🤖 AI Summary
This paper investigates the extreme sensitivity—termed “bit catastrophe”—of the Burrows–Wheeler Transform (BWT) to single-character edits (insertion, deletion, or substitution), focusing on their impact on the number $ r $ of equal-letter runs in the BWT output. We construct an infinite family of strings and rigorously prove that a single edit can increase $ r $ from constant to $ Theta(log n) $ or $ Theta(sqrt{n}) $, breaking prior additive lower bounds of $ Omega(log n) $. We also establish a tight upper bound of $ O(log n log r) $. Our analysis combines BWT theory, combinatorial string construction, and run-length modeling to characterize worst-case sensitivity for the first time: multiplicative growth of $ r $ reaches $ Theta(log n) $, while additive growth reaches $ Theta(sqrt{n}) $. These results significantly advance the theoretical understanding of BWT robustness under minor perturbations.
📝 Abstract
A bit catastrophe, loosely defined, is when a change in just one character of a string causes a significant change in the size of the compressed string. We study this phenomenon for the Burrows-Wheeler Transform (BWT), a string transform at the heart of several of the most popular compressors and aligners today. The parameter determining the size of the compressed data is the number of equal-letter runs of the BWT, commonly denoted $r$. We exhibit infinite families of strings in which insertion, deletion, resp. substitution of one character increases $r$ from constant to $Theta(log n)$, where $n$ is the length of the string. These strings can be interpreted both as examples for an increase by a multiplicative or an additive $Theta(log n)$-factor. As regards multiplicative factor, they attain the upper bound given by Akagi, Funakoshi, and Inenaga [Inf&Comput. 2023] of $O(log n log r)$, since here $r=O(1)$. We then give examples of strings in which insertion, deletion, resp. substitution of a character increases $r$ by a $Theta(sqrt{n})$ additive factor. These strings significantly improve the best known lower bound for an additive factor of $Omega(log n)$ [Giuliani et al., SOFSEM 2021].