🤖 AI Summary
This paper studies the Segmented Exact Matching (SegE) and Segmented Longest Common Subsequence (SegLCS) problems under a segment count constraint $f$: given a pattern string $P$ and text string $T$ (or two texts $T_1$, $T_2$), find an occurrence of $P$ or a common subsequence that is partitioned into exactly $f$ contiguous and order-preserving segments in the target string(s). First, assuming the Strong Exponential Time Hypothesis (SETH), we prove that SegE admits no $O((mn)^{1-varepsilon})$ algorithm, establishing a tight conditional lower bound. Second, we present an optimal $O(mn)$-time algorithm for SegE, matching this lower bound. Third, we design an improved SegLCS algorithm running in $O(f n_2 (n_1 - ell + 1))$ time—explicitly parameterized by the solution length $ell$—which significantly improves upon prior $O(f n_1 n_2)$ and $ ilde{O}((n_1 n_2)^{1-(1/3)^{f-2}})$ approaches. Our SegE algorithm is theoretically optimal; our SegLCS algorithm achieves superior practical efficiency.
📝 Abstract
The longest common subsequence (LCS) is a fundamental problem in string processing which has numerous algorithmic studies, extensions, and applications. A sequence $u_1, ldots, u_f$ of $f$ strings s said to be an ($f$-)segmentation of a string $P$ if $P = u_1 cdots u_f$. Li et al. [BIBM 2022] proposed a new variant of the LCS problem for given strings $T_1, T_2$ and an integer $f$, which we hereby call the segmental LCS problem (SegLCS), of finding (the length of) a longest string $P$ that has an $f$-segmentation which can be embedded into both $T_1$ and $T_2$. Li et al. [IJTCS-FAW 2024] gave a dynamic programming solution that solves SegLCS in $O(fn_1n_2)$ time with $O(fn_1 + n_2)$ space, where $n_1 = |T_1|$, $n_2 = |T_2|$, and $n_1 le n_2$. Recently, Banerjee et al. [ESA 2024] presented an algorithm which, for a constant $f geq 3$, solves SegLCS in $ ilde{O}((n_1n_2)^{1-(1/3)^{f-2}})$ time. In this paper, we deal with SegLCS as well as the problem of segmental subsequence pattern matching, SegE, that asks to determine whether a pattern $P$ of length $m$ has an $f$-segmentation that can be embedded into a text $T$ of length $n$. When $f = 1$, this is equivalent to substring matching, and when $f = |P|$, this is equivalent to subsequence matching. Our focus in this article is the case of general values of $f$, and our main contributions are threefold: (1) $O((mn)^{1-epsilon})$-time conditional lower bound for SegE under the strong exponential-time hypothesis (SETH), for any constant $epsilon>0$. (2) $O(mn)$-time algorithm for SegE. (3) $O(fn_2(n_1 - ell+1))$-time algorithm for SegLCS where $ell$ is the solution length.