🤖 AI Summary
This paper investigates the impact of single-character edits on the size of CDAWG-grammars—grammar-based compressions derived from Compact Directed Acyclic Word Graphs. Addressing the sensitivity of grammar size to such edits, we establish and prove a tight upper bound: the grammar size increases by at most $4e + 4$, where $e$ is the number of edges in the original CDAWG. Our method combines structural analysis of the CDAWG, suffix automaton construction, and localized modeling of edit-induced changes—characterizing how insertions, deletions, or substitutions affect states and transitions, and how these perturbations propagate through grammar derivations. This bound is the first to demonstrate *linear edit robustness* for CDAWG-grammars, contrasting sharply with the exponential sensitivity typical of general grammars. The result provides foundational theoretical guarantees for dynamic text compression and real-time index updates in evolving string collections.
📝 Abstract
The compact directed acyclic word graphs (CDAWG) [Blumer et al. 1987] of a string is the minimal compact automaton that recognizes all the suffixes of the string. CDAWGs are known to be useful for various string tasks including text pattern searching, data compression, and pattern discovery. The CDAWG-grammar [Belazzougui&Cunial 2017] is a grammar-based text compression based on the CDAWG. In this paper, we prove that the CDAWG-grammar size $g$ can increase by at most an additive factor of $4e + 4$ than the original after any single-character edit operation is performed on the input string, where $e$ denotes the number of edges in the corresponding CDAWG before the edit.