🤖 AI Summary
This work addresses the sensitivity and rigidity of exact Cartesian tree matching in time series by studying approximate Cartesian tree matching with substitution errors—specifically, identifying all substrings whose Cartesian trees are within Hamming distance $k$ of a given pattern. We introduce, for the first time, the notion of periodicity in Cartesian trees, extending classical string periodicity theory to this structural domain, and leverage this insight to design a segmented optimization strategy tailored to the properties of Cartesian trees. The proposed algorithm achieves a time complexity of $O(n\sqrt{m}\,k^{2.5})$ when $k \leq m^{1/5}$ and $O(nk^5)$ when $k \geq m^{1/5}$, significantly improving upon the state-of-the-art $O(nmk)$ approach for $k = o(m^{1/4})$, thereby enabling more efficient approximate matching.
📝 Abstract
The Cartesian tree of a sequence captures the relative order of the sequence's elements. In recent years, Cartesian tree matching has attracted considerable attention, particularly due to its applications in time series analysis. Consider a text $T$ of length $n$ and a pattern $P$ of length $m$. In the exact Cartesian tree matching problem, the task is to find all length-$m$ fragments of $T$ whose Cartesian tree coincides with the Cartesian tree $CT(P)$ of the pattern. Although the exact version of the problem can be solved in linear time [Park et al., TCS 2020], it remains rather restrictive; for example, it is not robust to outliers in the pattern. To overcome this limitation, we consider the approximate setting, where the goal is to identify all fragments of $T$ that are close to some string whose Cartesian tree matches $CT(P)$. In this work, we quantify closeness via the widely used Hamming distance metric. For a given integer parameter $k>0$, we present an algorithm that computes all fragments of $T$ that are at Hamming distance at most $k$ from a string whose Cartesian tree matches $CT(P)$. Our algorithm runs in time $\mathcal O(n \sqrt{m} \cdot k^{2.5})$ for $k \leq m^{1/5}$ and in time $\mathcal O(nk^5)$ for $k \geq m^{1/5}$, thereby improving upon the state-of-the-art $\mathcal O(nmk)$-time algorithm of Kim and Han [TCS 2025] in the regime $k = o(m^{1/4})$. On the way to our solution, we develop a toolbox of independent interest. First, we introduce a new notion of periodicity in Cartesian trees. Then, we lift multiple well-known combinatorial and algorithmic results for string matching and periodicity in strings to Cartesian tree matching and periodicity in Cartesian trees.