Improved Extended Regular Expression Matching

📅 2025-10-10

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This paper studies the Extended Regular Expression (ERE) matching problem: given an ERE $R$ and a string $Q$, determine whether $Q$ belongs to the language described by $R$. To address this classical problem, we introduce the matrix multiplication exponent $omega$ into ERE matching algorithm design for the first time. Our method integrates dynamic programming, bit-parallelism, and matrix multiplication optimizations, while innovatively handling negation and complement operators. The resulting algorithm achieves time complexity $O(n^omega k + n^2 m / min(w/log w, log n) + m)$ and space complexity $O(n^2 + m)$, where $n$, $m$, $k$, and $w$ denote the size of $R$, length of $Q$, number of distinct subexpressions, and machine word size, respectively. This improves upon prior state-of-the-art results both theoretically—by incorporating $omega$ dependence—and practically—by reducing space usage from $O(n^2 k)$ to $O(n^2)$, thereby significantly enhancing scalability.

Technology Category

Application Category

📝 Abstract

An extended regular expression $R$ specifies a set of strings formed by characters from an alphabet combined with concatenation, union, intersection, complement, and star operators. Given an extended regular expression $R$ and a string $Q$, the extended regular expression matching problem is to decide if $Q$ matches any of the strings specified by $R$. Extended regular expressions are a basic concept in formal language theory and a basic primitive for searching and processing data. Extended regular expression matching was introduced by Hopcroft and Ullmann in the 1970s [ extit{Introduction to Automata Theory, Languages and Computation}, 1979], who gave a simple dynamic programming solution using $O(n^3m)$ time and $O(n^2m)$ space, where $n$ is the length of $Q$ and $m$ is the length of $R$. Since then, several solutions have been proposed, but few significant asymptotic improvements have been obtained. The current state-of-the art solution, by Yamamoto and Miyazaki~[COCOON, 2003], uses $O(frac{n^3k + n^2m}{w} + n + m)$ time and $O(frac{n^2k + nm}{w} + n + m)$ space, where $k$ is the number of negation and complement operators in $R$ and $w$ is the number of bits in a word. This roughly replaces the $m$ factor with $k$ in the dominant terms of both the space and time bounds of the Hopcroft and Ullmann algorithm. We revisit the problem and present a new solution that significantly improves the previous time and space bounds. Our main result is a new algorithm that solves extended regular expression matching in [Oleft(n^ωk + frac{n^2m}{min(w/log w, log n)} + m ight)] time and $O(frac{n^2 log k}{w} + n + m) = O(n^2 +m)$ space, where $ωapprox 2.3716$ is the exponent of matrix multiplication. Essentially, this replaces the dominant $n^3k$ term with $n^ωk$ in the time bound, while simultaneously improving the $n^2k$ term in the space to $O(n^2)$.

Problem

Research questions and friction points this paper is trying to address.

Solving extended regular expression matching problem efficiently

Improving time complexity from O(n^3k) to O(n^ωk)

Reducing space complexity while handling complex operators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Replaces dominant n^3k term with n^ωk

Improves space complexity to O(n² + m)

Utilizes matrix multiplication exponent optimization

🔎 Similar Papers

SoK: A Literature and Engineering Review of Regular Expression Denial of Service