Longest Unbordered Factors on Run-Length Encoded Strings

📅 2025-07-22
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the efficient computation of the longest unbordered factor (LUF) on run-length encoded (RLE) strings. While classical algorithms require Ω(n) time on uncompressed strings of length n, we uncover a fundamental connection between unbordered factors and the block structure of RLE representations. Leveraging periodicity analysis, border theory, and divide-and-conquer strategies, we adapt the classic O(n¹·⁵) algorithm to the compressed domain, achieving O(m¹·⁵ log²m) time and O(m log²m) space complexity, where m is the RLE length. When high compression ratios apply (i.e., m ≪ n), the algorithm runs in O(n) time—outperforming all approaches requiring explicit decompression. To our knowledge, this is the first sublinear-time algorithm for LUF computation directly on RLE-compressed strings, enabling scalable analysis of highly repetitive sequences in bioinformatics and data compression applications.

Technology Category

Application Category

📝 Abstract
A border of a string is a non-empty proper prefix of the string that is also a suffix. A string is unbordered if it has no border. The longest unbordered factor is a fundamental notion in stringology, closely related to string periodicity. This paper addresses the longest unbordered factor problem: given a string of length $n$, the goal is to compute its longest factor that is unbordered. While recent work has achieved subquadratic and near-linear time algorithms for this problem, the best known worst-case time complexity remains $O(n log n)$ [Kociumaka et al., ISAAC 2018]. In this paper, we investigate the problem in the context of compressed string processing, particularly focusing on run-length encoded (RLE) strings. We first present a simple yet crucial structural observation relating unbordered factors and RLE-compressed strings. Building on this, we propose an algorithm that solves the problem in $O(m^{1.5} log^2 m)$ time and $O(m log^2 m)$ space, where $m$ is the size of the RLE-compressed input string. To achieve this, our approach simulates a key idea from the $O(n^{1.5})$-time algorithm by [Gawrychowski et al., SPIRE 2015], adapting it to the RLE setting through new combinatorial insights. When the RLE size $m$ is sufficiently small compared to $n$, our algorithm may show linear-time behavior in $n$, potentially leading to improved performance over existing methods in such cases.
Problem

Research questions and friction points this paper is trying to address.

Find longest unbordered factor in RLE-compressed strings
Improve time complexity over existing methods
Adapt algorithms for compressed string processing
Innovation

Methods, ideas, or system contributions that make the work stand out.

RLE string structural observation for unbordered factors
Adapted algorithm with O(m^1.5 log² m) time complexity
Simulated key idea from Gawrychowski et al. with new insights
🔎 Similar Papers
No similar papers found.
S
Shoma Sekizaki
University of Electro-Communications, Chofu, Japan
Takuya Mieno
Takuya Mieno
The University of Electro-Communications
Stringology