Linear Time Subsequence and Supersequence Regex Matching

📅 2025-04-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper investigates regex matching and universal validation under six string relations: subsequence, supersequence, infix, prefix, left-extension, and extension. Methodologically, it introduces the first integrated algorithm combining automaton state compression, dynamic programming pruning, and relation-closure optimization. This achieves optimal linear-time matching for subsequence and supersequence relations—reducing time complexity from the classical $O(|w||r|)$ to $O(|w| + |r|)$. It precisely establishes tight complexity bounds ($Theta(|w||r|)$) for quantitative variants such as longest and shortest matching. Furthermore, the paper provides a complete computational complexity classification for both matching and universal validation across all six relations, and confirms the tightness of these classifications via conditional lower bounds under the Strong Exponential Time Hypothesis (SETH).

Technology Category

Application Category

📝 Abstract
It is well-known that checking whether a given string $w$ matches a given regular expression $r$ can be done in quadratic time $O(|w|cdot |r|)$ and that this cannot be improved to a truly subquadratic running time of $O((|w|cdot |r|)^{1-epsilon})$ assuming the strong exponential time hypothesis (SETH). We study a different matching paradigm where we ask instead whether $w$ has a subsequence that matches $r$, and show that regex matching in this sense can be solved in linear time $O(|w| + |r|)$. Further, the same holds if we ask for a supersequence. We show that the quantitative variants where we want to compute a longest or shortest subsequence or supersequence of $w$ that matches $r$ can be solved in $O(|w| cdot |r|)$, i. e., asymptotically no worse than classical regex matching; and we show that $O(|w| + |r|)$ is conditionally not possible for these problems. We also investigate these questions with respect to other natural string relations like the infix, prefix, left-extension or extension relation instead of the subsequence and supersequence relation. We further study the complexity of the universal problem where we ask if all subsequences (or supersequences, infixes, prefixes, left-extensions or extensions) of an input string satisfy a given regular expression.
Problem

Research questions and friction points this paper is trying to address.

Linear time regex matching for subsequences and supersequences
Quantitative variants with longest/shortest matching sequences
Complexity of universal problem for various string relations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Linear time subsequence regex matching
Linear time supersequence regex matching
Quantitative variants solved in O(|w|*|r|)
🔎 Similar Papers
No similar papers found.