🤖 AI Summary
This paper addresses the efficient computation of the longest unbordered factor (LUF) on run-length encoded (RLE) strings. While classical algorithms require Ω(n) time on uncompressed strings of length n, we uncover a fundamental connection between unbordered factors and the block structure of RLE representations. Leveraging periodicity analysis, border theory, and divide-and-conquer strategies, we adapt the classic O(n¹·⁵) algorithm to the compressed domain, achieving O(m¹·⁵ log²m) time and O(m log²m) space complexity, where m is the RLE length. When high compression ratios apply (i.e., m ≪ n), the algorithm runs in O(n) time—outperforming all approaches requiring explicit decompression. To our knowledge, this is the first sublinear-time algorithm for LUF computation directly on RLE-compressed strings, enabling scalable analysis of highly repetitive sequences in bioinformatics and data compression applications.
📝 Abstract
A border of a string is a non-empty proper prefix of the string that is also a suffix. A string is unbordered if it has no border. The longest unbordered factor is a fundamental notion in stringology, closely related to string periodicity. This paper addresses the longest unbordered factor problem: given a string of length $n$, the goal is to compute its longest factor that is unbordered. While recent work has achieved subquadratic and near-linear time algorithms for this problem, the best known worst-case time complexity remains $O(n log n)$ [Kociumaka et al., ISAAC 2018]. In this paper, we investigate the problem in the context of compressed string processing, particularly focusing on run-length encoded (RLE) strings. We first present a simple yet crucial structural observation relating unbordered factors and RLE-compressed strings. Building on this, we propose an algorithm that solves the problem in $O(m^{1.5} log^2 m)$ time and $O(m log^2 m)$ space, where $m$ is the size of the RLE-compressed input string. To achieve this, our approach simulates a key idea from the $O(n^{1.5})$-time algorithm by [Gawrychowski et al., SPIRE 2015], adapting it to the RLE setting through new combinatorial insights. When the RLE size $m$ is sufficiently small compared to $n$, our algorithm may show linear-time behavior in $n$, potentially leading to improved performance over existing methods in such cases.