π€ AI Summary
This work investigates the exact error exponent of tandem coding in DNA-based data storage, focusing on asymptotic error decay under constant coding rates and linear/superlinear sequencing read depths. Methodologically, it integrates information-theoretic analysis, large deviations theory, and cascaded code design. The contributions are threefold: (i) it derives, for the first time under coded indexing strategies, a strictly tighter exact error exponent than all existing bounds; (ii) it proves that coded indexing achieves the optimal error exponent within the considered code class; and (iii) it reveals that, at low code rates, molecular redundancy and the sequencing error model fundamentally govern system performance. The results comprehensively cover linear, superlinear, and low-rate read-depth regimes. Theoretically, they establish that coded indexing attains the maximal achievable error decay rateβthereby significantly tightening the fundamental performance limits for DNA storage systems.
π Abstract
In this paper, we consider a concatenated coding based class of DNA storage codes in which the selected molecules are constrained to be taken from an ``inner'' codebook associated with the sequencing channel. This codebook is used in a ``black-box'' manner, and is only assumed to operate at an achievable rate in the sense of attaining asymptotically vanishing maximal (inner) error probability. We first derive the exact error exponent in a widely-studied regime of constant rate and a linear number of sequencing reads, and show strict improvements over an existing achievable error exponent. Moreover, our achievability analysis is based on a coded-index strategy, implying that such strategies attain the highest error exponents within the broader class of codes that we consider. We then extend our results to other scaling regimes, including a super-linear number of reads, as well as several certain low-rate regimes. We find that the latter comes with notable intricacies, such as the suboptimality of codewords with all distinct molecules, and certain dependencies of the error exponents on the model for sequencing errors.