🤖 AI Summary
This work addresses the lack of theoretical bounds and efficient constructions for ordered composite DNA sequences under non-binary alphabets and complex error models. We propose a general t-(e₁,…,eₜ) composite error-and-erasure channel model applicable to any q-ary alphabet and resolution parameter k. By establishing code-class equivalence relations to simplify analysis and combining sphere-packing bounds with probabilistic methods, we derive—for the first time—universal upper bounds for composite error-correcting codes (CECCs) covering all parameters (q, k, e₁,…,eₖ, e), as well as asymptotic upper bounds for composite deletion-correcting codes (CDCCs). Furthermore, we design explicit, systematic encoding and decoding schemes with near-optimal redundancy, yielding the first efficient constructions of both binary and non-binary CECCs.
📝 Abstract
This paper extends the foundational work of Dollma \emph{et al}. on codes for ordered composite DNA sequences. We consider the general setting with an alphabet of size $q$ and a resolution parameter $k$, moving beyond the binary ($q=2$) case primarily studied previously. We investigate error-correcting codes for substitution errors and deletion errors under several channel models, including $(e_1,\ldots,e_k)$-composite error/deletion, $e$-composite error/deletion, and the newly introduced $t$-$(e_1,\ldots,e_t)$-composite error/deletion model.
We first establish equivalence relations among families of composite-error correcting codes (CECCs) and among families of composite-deletion correcting codes (CDCCs). This significantly reduces the number of distinct error-parameter sets that require separate analysis. We then derive novel and general upper bounds on the sizes of CECCs using refined sphere-packing arguments and probabilistic methods. These bounds together cover all values of parameters $q$, $k$, $(e_1,\ldots,e_k)$ and $e$. In contrast, previous bounds were only established for $q=2$ and limited choices of $k$, $(e_1,\ldots,e_k)$ and $e$. For CDCCs, we generalize a known non-asymptotic upper bound for $(1,0,\ldots,0)$-CDCCs and then provide a cleaner asymptotic bound.
On the constructive side, for any $q\ge2$, we propose $(1,0,\ldots,0)$-CDCCs, $1$-CDCCs and $t$-$(1,\ldots,1)$-CDCCs with near-optimal redundancies. These codes have efficient and systematic encoders. For substitution errors, we design the first explicit encoding and decoding algorithms for the binary $(1,0,\ldots,0)$-CECC constructed by Dollma \emph{et al}, and extend the approach to general $q$. Furthermore, we give an improved construction of binary $1$-CECCs, a construction of nonbinary $1$-CECCs, and a construction of $t$-$(1,\ldots,1)$-CECCs. These constructions are also systematic.