Robust Composite DNA Storage under Sampling Randomness, Substitution, and Insertion-Deletion Errors

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the insufficient reliability of composite DNA data storage under sampling randomness, base substitution, and insertion–deletion errors by formulating the problem as a polynomial channel for the first time. It introduces a three-dimensional probability simplex to represent composite nucleotide symbols, unifying this representation with a digital modulation framework. A joint error-correction mechanism is proposed, integrating LDPC channel coding, log-likelihood ratio (LLR)-based soft information processing, probability-simplex constellation mapping, and a dynamic constellation update strategy tailored to multiple error types. Experimental results demonstrate that, under existing LDPC codes, the proposed method significantly outperforms current approaches that only account for limited-magnitude probabilistic errors, thereby achieving more robust DNA-based data storage.

Technology Category

Application Category

📝 Abstract
DNA data storage offers a high-density, long-term alternative to traditional storage systems, addressing the exponential growth of digital data. Composite DNA extends this paradigm by leveraging mixtures of nucleotides to increase storage capacity beyond the four standard bases. In this work, we model composite DNA storage as a multinomial channel and draw an analogy to digital modulation by representing composite letters on the three-dimensional probability simplex. To mitigate errors caused by sampling randomness, we derive transition probabilities and log-likelihood ratios (LLRs) for each constellation point and employ practical channel codes for error correction. We then extend this framework to substitution and insertion-deletion (ID) channels, proposing constellation update rules that account for these additional impairments. Numerical results demonstrate that our approach achieves reliable performance with existing LDPC codes, compared to the prior schemes designed for limited-magnitude probability errors, whose performance degrades significantly under sampling randomness.
Problem

Research questions and friction points this paper is trying to address.

composite DNA storage
sampling randomness
substitution errors
insertion-deletion errors
reliability
Innovation

Methods, ideas, or system contributions that make the work stand out.

composite DNA storage
multinomial channel
log-likelihood ratios
insertion-deletion errors
LDPC codes
🔎 Similar Papers
No similar papers found.
Busra Tegin
Busra Tegin
Bilkent University
T
Tolga M Duman
Department of Electrical and Electronics Engineering, Bilkent University, Ankara, Turkey