🤖 AI Summary
This work proposes a low-complexity error-correcting code construction tailored for DNA data storage and document synchronization, where mixed burst errors—comprising up to $t_1$ deletions and $t_2$ insertions—frequently occur. By establishing an equivalence between $(t_1,t_2)$-DI and $(t_2,t_1)$-DI error patterns, the study derives, for the first time, theoretical upper and lower bounds on code size under such multi-burst error scenarios. Leveraging combinatorial coding theory and equivalence class analysis, the authors design a practical encoding scheme that avoids the high overhead of syndrome compression. The resulting codes maintain strong error-correction capabilities while significantly reducing encoding and decoding complexity, outperforming existing approaches in both efficiency and scalability.
📝 Abstract
Burst errors involving simultaneous insertions, deletions, and substitutions occur in practical scenarios, including DNA data storage and document synchronization, motivating developments of channel codes that can correct such errors. In this paper, we address the problem of constructing error-correcting codes (ECCs) capable of handling multiple bursts of $t_1$-deletion-$t_2$-insertion ($(t_1,t_2)$-DI) errors, where each burst consists of $t_1$ deletions followed by $t_2$ insertions in a binary sequence. We make three key contributions: Firstly, we establish the fundamental equivalence of (1) two bursts of $(t_1,t_2)$-DI ECCs, (2) two bursts of $(t_2,t_1)$-DI ECCs, and (3) one burst each of $(t_1,t_2)$-DI and $(t_2,t_1)$-DI ECCs. Then, we derive lower and upper bounds on the code size of two bursts of $(t_1,t_2)$-DI ECCs, which can naturally be extended to the case of multiple bursts. Finally, we present constructions of two bursts of $(t_1,t_2)$-DI ECCs. Compared to the codes obtained by the syndrome compression technique, the resulting codes achieve significantly lower computational complexity.