DNA-MGC+: A versatile codec for reliable and resource-efficient data storage on synthetic DNA

📅 2026-03-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses insertion, deletion, substitution (IDS) errors and sequence loss inherent in DNA data storage due to synthesis, amplification, and sequencing processes by proposing DNA-MGC+, a universal encoding–decoding framework. By optimizing the encoding strategy, DNA-MGC+ simultaneously achieves high reliability, improved sequencing depth efficiency, reduced read cost, faster decoding, higher storage density, and enhanced error correction capability. The system is compatible with both Illumina and Nanopore platforms and supports low-depth sequencing under electrochemical synthesis conditions. In both simulations and experiments, DNA-MGC+ enables accurate data recovery even at IDS error rates as high as 24%, while maintaining reliable decoding at sequencing depths below 3× and read costs under 3.5 bits per nucleotide.

Technology Category

Application Category

📝 Abstract
The biochemical processes underlying DNA data storage, including synthesis, amplification, and sequencing, are inherently noisy. Consequently, base-level insertion, deletion, and substitution (IDS) errors, as well as sequence-level dropouts, occur and pose major challenges for reliable data retrieval. Here we introduce DNA-MGC+, a DNA storage codec designed to enable reliable and resource-efficient data retrieval under diverse operating conditions. We evaluate DNA-MGC+ across a wide range of in silico and in vitro settings, including experiments with both Illumina and Nanopore sequencing, and show that it consistently outperforms existing codecs. In particular, DNA-MGC+ achieves simultaneous gains in sequencing depth requirements, read cost, decoding time, storage density, and error-correction capability under explicit reliability constraints. Notable results include reliable decoding under IDS error rates of up to 24% in synthetic scenarios, and reliable retrieval at sequencing depths below 3x with read costs below 3.5 bits/nt under electrochemical synthesis for both Illumina and Nanopore sequencing.
Problem

Research questions and friction points this paper is trying to address.

DNA data storage
insertion-deletion-substitution errors
sequence dropout
reliable data retrieval
noisy biochemical processes
Innovation

Methods, ideas, or system contributions that make the work stand out.

DNA data storage
error correction
insertion-deletion-substitution errors
resource-efficient codec
sequencing depth
🔎 Similar Papers
No similar papers found.
R
Ramy Khabbaz
Côte d’Azur University, CNRS, I3S, Sophia Antipolis, France
J
Jérémy Mateos
Côte d’Azur University, CNRS, I3S, Sophia Antipolis, France; Pearcode, Sophia Antipolis, France
Marc Antonini
Marc Antonini
CNRS and University of Nice-Sophia Antipolis
Image codingmultiresolution analysis3D meshesbio-inspired image processingDNA storage
Serge Kas Hanna
Serge Kas Hanna
Junior Professor, CNRS, Côte d'Azur University
Coding TheoryDNA-based Data StorageDistributed Learning