Correcting Multiple Substitutions in Nanopore-Sequencing Reads

📅 2025-05-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Nanopore sequencing suffers from high substitution error rates—particularly concurrent errors across multiple positions—severely limiting read accuracy. To address this, we establish a theoretical framework for multi-substitution correction codes tailored to read-length vectors. First, we rigorously prove that correcting (t geq 2) substitutions requires at least (t log n - O(1)) bits of redundancy—significantly exceeding the (log log n) redundancy sufficient for single-error correction. Second, under a simplified nanopore channel model incorporating inter-symbol interference and measurement noise, we employ graph-theoretic clique covering techniques to demonstrate the tightness of this lower bound. Third, we propose an explicit code construction achieving constant-factor optimality in redundancy. Our work delivers the first information-theoretically guaranteed, efficient multi-substitution correction paradigm for nanopore sequencing.

Technology Category

Application Category

📝 Abstract
Despite their significant advantages over competing technologies, nanopore sequencers are plagued by high error rates, due to physical characteristics of the nanopore and inherent noise in the biological processes. It is thus paramount not only to formulate efficient error-correcting constructions for these channels, but also to establish bounds on the minimum redundancy required by such coding schemes. In this context, we adopt a simplified model of nanopore sequencing inspired by the work of Mao emph{et al.}, accounting for the effects of intersymbol interference and measurement noise. For an input sequence of length $n$, the vector that is produced, designated as the emph{read vector}, may additionally suffer at most (t) substitution errors. We employ the well-known graph-theoretic clique-cover technique to establish that at least (tlog n -O(1)) bits of redundancy are required to correct multiple ((t geq 2)) substitutions. While this is surprising in comparison to the case of a single substitution, that necessitates at most (log log n - O(1)) bits of redundancy, a suitable error-correcting code that is optimal up to a constant follows immediately from the properties of read vectors.
Problem

Research questions and friction points this paper is trying to address.

Correcting multiple substitution errors in nanopore sequencing reads
Establishing bounds on minimum redundancy for error-correcting codes
Modeling intersymbol interference and noise in nanopore sequencing
Innovation

Methods, ideas, or system contributions that make the work stand out.

Graph-theoretic clique-cover technique for error correction
Model accounting for intersymbol interference and noise
Optimal redundancy bounds for multiple substitution errors
🔎 Similar Papers
No similar papers found.