Online computation of normalized substring complexity

📅 2025-10-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the online computation of the normalized substring complexity δ for streaming strings—i.e., dynamically maintaining the maximum diversity ratio across all substring lengths as characters arrive sequentially. We propose two efficient online algorithms: (1) an amortized-analysis-based approach achieving O(log n) amortized time per character; and (2) a data-structure-driven method—incorporating dynamic suffix array maintenance and interval extremum queries—that guarantees O(log³ n) worst-case time per update. To our knowledge, this is the first solution enabling polynomial-logarithmic-time online computation of δ, overcoming prior limitations requiring O(n²) preprocessing or offline processing. Experimental evaluation confirms scalability to large-scale streaming strings, enabling real-time compression-feature analysis. The framework provides a novel tool for streaming text mining and lightweight entropy estimation.

Technology Category

Application Category

📝 Abstract
The normalized substring complexity $δ$ of a string is defined as $max_k {c[k]/k}$, where $c[k]$ is the number of extit{distinct} substrings of length $k$. This simply defined measure has recently attracted attention due to its established relationship to popular string compression algorithms. We consider the problem of computing $δ$ online, when the string is provided from a stream. We present two algorithms solving the problem: one working in $O(log n)$ amortized time per character, and the other in $O(log^3 n)$ worst-case time per character. To our knowledge, this is the first polylog-time online solution to this problem.
Problem

Research questions and friction points this paper is trying to address.

Online computation of normalized substring complexity from data streams
Real-time calculation of δ using polylog-time algorithms
First streaming solution for max distinct substring ratio c[k]/k
Innovation

Methods, ideas, or system contributions that make the work stand out.

Online algorithm computes normalized substring complexity
Two methods achieve polylog-time per character processing
First polylog-time solution for streaming substring complexity
🔎 Similar Papers