🤖 AI Summary
This work addresses the high cost and latency associated with random access to specific data strands in DNA-based storage by proposing an efficient algorithmic framework and novel coding constructions that significantly reduce the required sequencing coverage depth. Leveraging algebraic coding theory, probabilistic analysis, and combinatorial optimization, the authors devise a linear-complexity exact computation method and refine the search strategy for optimal generator matrices. They further prove that for the case $n = k + 1$, simple parity-check codes achieve universal optimality. The theoretical results yield substantially improved bounds: the upper bound on coverage depth is reduced to $0.8811k$ (from $0.8815k$) for $k = 3$ and reaches $0.8629k$ for $k = 4$, while a tighter lower bound is also established.
📝 Abstract
As DNA data storage moves closer to practical deployment, minimizing sequencing coverage depth is essential to reduce both operational costs and retrieval latency. This paper addresses the recently studied Random Access Problem, which evaluates the expected number of read samples required to recover a specific information strand from $n$ encoded strands. We propose a novel algorithm to compute the exact expected number of reads, achieving a computational complexity of $O(n)$ for fixed field size $q$ and information length $k$. Furthermore, we derive explicit formulas for the average and maximum expected number of reads, enabling an efficient search for optimal generator matrices under small parameters. Beyond theoretical analysis, we present new code constructions that improve the best-known upper bound from $0.8815k$ to $0.8811k$ for $k=3$, and achieve an upper bound of $0.8629k$ for $k=4$ for sufficiently large $q$. We also establish a tighter theoretical lower bound on the expected number of reads that improves upon state-of-the-art bounds. In particular, this bound establishes the optimality of the simple parity code for the case of $n=k+1$ across any alphabet $q$.