🤖 AI Summary
This work addresses the problem of reliable data recovery in DNA storage systems employing MDS-based concatenated coding (e.g., Reed–Solomon codes) under independent and identically distributed substitution errors. We establish the first end-to-end analytical model for the success probability of error detection and correction. Our method integrates information-theoretic analysis, probabilistic modeling, and DNA channel characteristics to quantify how sequencing read count, inter-strand read distribution, inner and outer code rates, and substitution error rate jointly affect recovery reliability. Key contributions include: (i) uncovering a fundamental trade-off between inner and outer code rates; (ii) deriving a closed-form expression for the minimum required number of reads to guarantee high-reliability recovery; and (iii) providing a computationally tractable criterion for optimal code rate allocation. This work lays a rigorous theoretical foundation and offers practical design guidelines for joint coding and sequencing optimization in high-reliability, low-overhead DNA storage systems.
📝 Abstract
This work presents a theoretical analysis of the probability of successfully retrieving data encoded with MDS codes (e.g., Reed-Solomon codes) in DNA storage systems. We study this probability under independent and identically distributed (i.i.d.) substitution errors, focusing on a common code design strategy that combines inner and outer MDS codes. Our analysis demonstrates how this probability depends on factors such as the total number of sequencing reads, their distribution across strands, the rates of the inner and outer codes, and the substitution error probabilities. These results provide actionable insights into optimizing DNA storage systems under reliability constraints, including determining the minimum number of sequencing reads needed for reliable data retrieval and identifying the optimal balance between the rates of inner and outer MDS codes.