🤖 AI Summary
This work addresses nanopore-based DNA data storage by circumventing reliance on conventional base-calling algorithms and establishing a theoretical and evaluative framework for achievable information rates. We propose NNC-Scrappie, a noise channel model integrating geometric repeat sampling and Gaussian noise, and develop a dynamic time warping (DTW)-based method for achievable rate analysis. Coupled with a simplified message-passing decoder, our approach enables efficient soft decoding on real Oxford Nanopore Technologies (ONT) data. On single-read sequencing of public nanopore datasets, it achieves an average storage rate of 0.96 bits/base and a peak of 1.18 bits/base—substantially outperforming existing hard-decision schemes—and natively supports multi-strand coding architectures. Our core contribution is the first base-calling-free DNA storage channel model and rate evaluation paradigm, providing both theoretical foundations and practical decoding tools for robust, low-latency DNA storage systems.
📝 Abstract
This paper studies achievable rates of nanopore-based DNA storage when nanopore signals are decoded using a tractable channel model that does not rely on a basecalling algorithm. Specifically, the noisy nanopore channel (NNC) with the Scrappie pore model generates average output levels via i.i.d. geometric sample duplications corrupted by i.i.d. Gaussian noise (NNC-Scrappie). Simplified message passing algorithms are derived for efficient soft decoding of nanopore signals using NNC-Scrappie. Previously, evaluation of this channel model was limited by the lack of DNA storage datasets with nanopore signals included. This is solved by deriving an achievable rate based on the dynamic time-warping (DTW) algorithm that can be applied to genomic sequencing datasets subject to constraints that make the resulting rate applicable to DNA storage. Using a publicly-available dataset from Oxford Nanopore Technologies (ONT), it is demonstrated that coding over multiple DNA strands of $100$ bases in length and decoding with the NNC-Scrappie decoder can achieve rates of at least $0.64-1.18$ bits per base, depending on the channel quality of the nanopore that is chosen in the sequencing device per channel-use, and $0.96$ bits per base on average assuming uniformly chosen nanopores. These rates are pessimistic since they only apply to single reads and do not include calibration of the pore model to specific nanopores.