Subsequence Matching and LCS with Segment Number Constraints

📅 2024-07-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper studies the Segmented Exact Matching (SegE) and Segmented Longest Common Subsequence (SegLCS) problems under a segment count constraint $f$: given a pattern string $P$ and text string $T$ (or two texts $T_1$, $T_2$), find an occurrence of $P$ or a common subsequence that is partitioned into exactly $f$ contiguous and order-preserving segments in the target string(s). First, assuming the Strong Exponential Time Hypothesis (SETH), we prove that SegE admits no $O((mn)^{1-varepsilon})$ algorithm, establishing a tight conditional lower bound. Second, we present an optimal $O(mn)$-time algorithm for SegE, matching this lower bound. Third, we design an improved SegLCS algorithm running in $O(f n_2 (n_1 - ell + 1))$ time—explicitly parameterized by the solution length $ell$—which significantly improves upon prior $O(f n_1 n_2)$ and $ ilde{O}((n_1 n_2)^{1-(1/3)^{f-2}})$ approaches. Our SegE algorithm is theoretically optimal; our SegLCS algorithm achieves superior practical efficiency.

Technology Category

Application Category

📝 Abstract
The longest common subsequence (LCS) is a fundamental problem in string processing which has numerous algorithmic studies, extensions, and applications. A sequence $u_1, ldots, u_f$ of $f$ strings s said to be an ($f$-)segmentation of a string $P$ if $P = u_1 cdots u_f$. Li et al. [BIBM 2022] proposed a new variant of the LCS problem for given strings $T_1, T_2$ and an integer $f$, which we hereby call the segmental LCS problem (SegLCS), of finding (the length of) a longest string $P$ that has an $f$-segmentation which can be embedded into both $T_1$ and $T_2$. Li et al. [IJTCS-FAW 2024] gave a dynamic programming solution that solves SegLCS in $O(fn_1n_2)$ time with $O(fn_1 + n_2)$ space, where $n_1 = |T_1|$, $n_2 = |T_2|$, and $n_1 le n_2$. Recently, Banerjee et al. [ESA 2024] presented an algorithm which, for a constant $f geq 3$, solves SegLCS in $ ilde{O}((n_1n_2)^{1-(1/3)^{f-2}})$ time. In this paper, we deal with SegLCS as well as the problem of segmental subsequence pattern matching, SegE, that asks to determine whether a pattern $P$ of length $m$ has an $f$-segmentation that can be embedded into a text $T$ of length $n$. When $f = 1$, this is equivalent to substring matching, and when $f = |P|$, this is equivalent to subsequence matching. Our focus in this article is the case of general values of $f$, and our main contributions are threefold: (1) $O((mn)^{1-epsilon})$-time conditional lower bound for SegE under the strong exponential-time hypothesis (SETH), for any constant $epsilon>0$. (2) $O(mn)$-time algorithm for SegE. (3) $O(fn_2(n_1 - ell+1))$-time algorithm for SegLCS where $ell$ is the solution length.
Problem

Research questions and friction points this paper is trying to address.

Explore segmental LCS problem
Develop efficient SegE algorithm
Provide conditional lower bounds
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic programming for SegLCS
Conditional lower bound for SegE
Efficient algorithm for SegE
🔎 Similar Papers
No similar papers found.