๐ค AI Summary
This work investigates how to characterize the set of scattered factors that can be generated from a word after removing all embeddings of a given scattered factor. It introduces, for the first time, the notion of a โcomplementary scattered factor,โ defining \( C(w, u) \) as the set of scattered factors obtainable from word \( w \) upon deleting all embeddings of scattered factor \( u \). The study uncovers a combinatorial connection between this construct and the shuffle operation. Leveraging combinatorial analysis, string embedding enumeration, and efficient enumeration techniques, the authors devise an algorithm to compute \( C(w, u) \) in \( O(|w|\cdot|u|\cdot\binom{|w|}{|u|}) \) time. Furthermore, they present polynomial- or exponential-time algorithms to reconstruct the original word or factor from \( C(w, u) \) together with \( u \) (or \( w \)), thereby offering a novel perspective on scattered factor theory.
๐ Abstract
Starting in the 1970s with the fundamental work of Imre Simon, \emph{scattered factors} (also known as subsequences or scattered subwords) have remained a consistently and heavily studied object. The majority of work on scattered factors can be split into two broad classes of problems: given a word, what information, in the form of scattered factors, are contained, and which are not. In this paper, we consider an intermediary problem, introducing the notion of \emph{complement scattered factors}. Given a word $w$ and a scattered factor $u$ of $w$, the complement scattered factors of $w$ with regards to $u$, $C(w, u)$, is the set of scattered factors in $w$ that can be formed by removing any embedding of $u$ from $w$. This is closely related to the \emph{shuffle} operation in which two words are intertwined, i.e., we extend previous work relating to the shuffle operator, using knowledge about scattered factors. Alongside introducing these sets, we provide combinatorial results on the size of the set $C(w, u)$, an algorithm to compute the set $C(w, u)$ from $w$ and $u$ in $O(\vert w \vert \cdot \vert u \vert \binom{w}{u})$ time, where $\binom{w}{u}$ denotes the number of embeddings of $u$ into $w$, an algorithm to construct $u$ from $w$ and $C(w, u)$ in $O(\vert w \vert^2 \binom{\vert w \vert}{\vert w \vert - \vert u \vert})$ time, and an algorithm to construct $w$ from $u$ and $C(w, u)$ in $O(\vert u \vert \cdot \vert w \vert^{\vert u \vert + 1})$ time.