String Representation in Suffixient Set Size Space

📅 2026-04-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study resolves a long-standing open problem concerning the attainability of the string repetitiveness measure χ(w): whether there always exists a string representation of size O(χ(w)). To this end, we introduce the Substring Equation System (SES), a novel theoretical framework, and combine it with combinatorial structures such as suffix-complete sets to construct the first compression scheme capable of representing any string w within O(χ(w)) space. Our work not only establishes, for the first time, the attainability of the χ measure but also provides a new modeling paradigm for string compression that leverages structural regularities in repetitive strings.
📝 Abstract
Repetitiveness measures quantify how much repetitive structure a string contains and serve as parameters for compressed representations and indexing data structures. We study the measure $χ$, defined as the size of the smallest suffixient set. Although $χ$ has been studied extensively, its reachability, whether every string $w$ admits a string representation of size $O(χ(w))$ words, has remained an important open problem. We answer this question affirmatively by presenting the first such representation scheme. Our construction is based on a new model, the substring equation system (SES), and we show that every string admits an SES of size $O(χ(w))$.
Problem

Research questions and friction points this paper is trying to address.

string representation
suffixient set
repetitiveness measure
reachability
compressed representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

suffixient set
substring equation system
string repetitiveness
compressed representation
χ measure
🔎 Similar Papers
2024-07-26SPIRECitations: 1
H
Hiroki Shibata
Joint Graduate School of Mathematics for Innovation, Kyushu University, Japan
Hideo Bannai
Hideo Bannai
M&D Data Science Center, Institute of Science Tokyo, Japan
string algorithmscombinatorics on words