🤖 AI Summary
This paper studies property testing for pattern matching: given a pattern $P$ of length $m$ and a text $T$ of length $n$, determine whether some substring of $T$ is within Hamming distance $leq k$ from $P$. We present the first adaptive and non-adaptive algorithms covering the full parameter regime $k in [1,m]$, establishing tight upper and lower bounds on both time and query complexity, and revealing an inherent separation between adaptive and non-adaptive settings across different ranges of $k$. Our approach combines randomized hashing, information-theoretic lower bound analysis, and adaptive query strategies. For canonical regimes such as $n = m + Theta(m)$, our algorithm achieves $ ilde{O}(n/sqrt{k})$ time complexity—improving upon the prior best $ ilde{O}(n/k^{1/3})$—and we prove this bound is optimal up to polylogarithmic factors.
📝 Abstract
The classic exact pattern matching problem, given two strings -- a pattern $P$ of length $m$ and a text $T$ of length $n$ -- asks whether $P$ occurs as a substring of $T$. A property tester for the problem needs to distinguish (with high probability) the following two cases for some threshold $k$: the YES case, where $P$ occurs as a substring of $T$, and the NO case, where $P$ has Hamming distance greater than $k$ from every substring of $T$, that is, $P$ has no $k$-mismatch occurrence in $T$.
In this work, we provide adaptive and non-adaptive property testers for the exact pattern matching problem, jointly covering the whole spectrum of parameters. We further establish unconditional lower bounds demonstrating that the time and query complexities of our algorithms are optimal, up to $mathrm{polylog}, n$ factors hidden within the $ ilde O(cdot)$ notation below.
In the most studied regime of $n=m+Θ(m)$, our non-adaptive property tester has the time complexity of $ ilde O(n/sqrt{k})$, and a matching lower bound remains valid for the query complexity of adaptive algorithms. This improves both upon a folklore solution that attains the optimal query complexity but requires $Ω(n)$ time, and upon the only previously known sublinear-time property tester, by Chan, Golan, Kociumaka, Kopelowitz, and Porat [STOC 2020], with time complexity $ ilde O(n/sqrt[3]{k})$. The aforementioned results remain valid for $n=m+Ω(m)$, where our optimal running time $ ilde O(sqrt{nm/k}+n/k)$ improves upon the previously best time complexity of $ ilde O(sqrt[3]{n^2m/k}+n/k)$. In the regime of $n=m+o(m)$, which has not been targeted in any previous work, we establish a surprising separation between adaptive and non-adaptive algorithms, whose optimal time and query complexities are $ ilde O(sqrt{(n-m+1)m/k}+n/k)$ and $ ilde O(min(nsqrt{n-m+1}/k,sqrt{nm/k}+n/k))$, respectively.