Exact Error Exponents of Concatenated Codes for DNA Storage

📅 2024-09-02

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

217K/year

🤖 AI Summary

This work investigates the exact error exponent of tandem coding in DNA-based data storage, focusing on asymptotic error decay under constant coding rates and linear/superlinear sequencing read depths. Methodologically, it integrates information-theoretic analysis, large deviations theory, and cascaded code design. The contributions are threefold: (i) it derives, for the first time under coded indexing strategies, a strictly tighter exact error exponent than all existing bounds; (ii) it proves that coded indexing achieves the optimal error exponent within the considered code class; and (iii) it reveals that, at low code rates, molecular redundancy and the sequencing error model fundamentally govern system performance. The results comprehensively cover linear, superlinear, and low-rate read-depth regimes. Theoretically, they establish that coded indexing attains the maximal achievable error decay rate—thereby significantly tightening the fundamental performance limits for DNA storage systems.

Technology Category

Application Category

📝 Abstract

In this paper, we consider a concatenated coding based class of DNA storage codes in which the selected molecules are constrained to be taken from an ``inner'' codebook associated with the sequencing channel. This codebook is used in a ``black-box'' manner, and is only assumed to operate at an achievable rate in the sense of attaining asymptotically vanishing maximal (inner) error probability. We first derive the exact error exponent in a widely-studied regime of constant rate and a linear number of sequencing reads, and show strict improvements over an existing achievable error exponent. Moreover, our achievability analysis is based on a coded-index strategy, implying that such strategies attain the highest error exponents within the broader class of codes that we consider. We then extend our results to other scaling regimes, including a super-linear number of reads, as well as several certain low-rate regimes. We find that the latter comes with notable intricacies, such as the suboptimality of codewords with all distinct molecules, and certain dependencies of the error exponents on the model for sequencing errors.

Problem

Research questions and friction points this paper is trying to address.

Exact error exponents for concatenated DNA storage codes

Improving error exponents in constant rate and linear read regimes

Analyzing error exponents in super-linear and low-rate regimes

Innovation

Methods, ideas, or system contributions that make the work stand out.

Concatenated coding for DNA storage

Coded-index strategy for error exponents

Extends to super-linear read regimes

🔎 Similar Papers

No similar papers found.