Achievable Rates of Nanopore-based DNA Storage

📅 2025-08-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses nanopore-based DNA data storage by circumventing reliance on conventional base-calling algorithms and establishing a theoretical and evaluative framework for achievable information rates. We propose NNC-Scrappie, a noise channel model integrating geometric repeat sampling and Gaussian noise, and develop a dynamic time warping (DTW)-based method for achievable rate analysis. Coupled with a simplified message-passing decoder, our approach enables efficient soft decoding on real Oxford Nanopore Technologies (ONT) data. On single-read sequencing of public nanopore datasets, it achieves an average storage rate of 0.96 bits/base and a peak of 1.18 bits/base—substantially outperforming existing hard-decision schemes—and natively supports multi-strand coding architectures. Our core contribution is the first base-calling-free DNA storage channel model and rate evaluation paradigm, providing both theoretical foundations and practical decoding tools for robust, low-latency DNA storage systems.

Technology Category

Application Category

📝 Abstract
This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of $100$ bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least $0.64-1.18$ bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and $0.96$ bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores.
Problem

Research questions and friction points this paper is trying to address.

Studying achievable rates in nanopore-based DNA storage systems
Developing efficient decoding without basecalling using NNC-Scrappie model
Evaluating rates via DTW algorithm on genomic sequencing datasets
Innovation

Methods, ideas, or system contributions that make the work stand out.

Decoding nanopore signals without basecalling algorithm
Using NNC-Scrappie model for efficient soft decoding
Applying DTW algorithm to genomic sequencing datasets
🔎 Similar Papers
No similar papers found.