🤖 AI Summary
This study addresses the set-parameterized pattern matching problem: given a pattern and a text, each composed of sets of characters, determine whether there exists a bijection over the alphabet that renders the two strings equivalent at the set level. To tackle this, the work introduces the first extension of the Karp–Rabin fingerprinting technique to the setting of set-strings, proposing a randomized algorithm based on three-layer hashing. This approach dynamically encodes substrings during a single pass over the text while verifying the existence of a valid bijection. The method effectively overcomes three key challenges—representation blowup, inter-set matching, and dynamic encoding—and achieves an O(N + M) time complexity with high probability, substantially improving matching efficiency.
📝 Abstract
We study the "set parameterized matching" problem, a generalization of the classical parameterized matching problem introduced by Baker. In set parameterized matching, both the pattern and text are sequences where each position contains a set of characters rather than a single character. Two set-strings parameterized match if there exists a bijection between their alphabets that maps one to the other set-wise. Boussidan introduced this problem for the case of equal-length set-strings. We present a randomized algorithm running in $O(N + M)$ time with high probability, where $N$ is the text size and $M$ is the pattern size. Our approach employs a novel three-layer hashing scheme based on Karp-Rabin fingerprinting that addresses the challenges of (1) the size blowup in representations of the problem, (2) set-to-set matching, and (3) the dynamic nature of encodings of text substrings during pattern scanning.