π€ AI Summary
Efficient error correction for insertion, deletion, and substitution (IDS) errors remains a major challenge in DNA-based data storage. Method: This paper proposes THEA-codeβthe first end-to-end trainable, channel-customized IDS error-correcting code framework. It innovatively integrates Gumbel-Softmax relaxation for discrete code optimization with a differentiable IDS channel model, overcoming the non-differentiability and poor generalizability of conventional constructive coding schemes. Built upon an autoencoder architecture, THEA-code jointly optimizes codeword generation, channel simulation, and decoding. Contribution/Results: Experiments demonstrate that THEA-code significantly improves decoding accuracy and robustness across diverse, realistic IDS channel models. It supports flexible adaptation to varying channel characteristics without manual code redesign, establishing a novel, learnable paradigm for efficient error correction in DNA storage.
π Abstract
Insertion, deletion, and substitution (IDS) error-correcting codes have garnered increased attention with recent advancements in DNA storage technology. However, a universal method for designing IDS-correcting codes across varying channel settings remains underexplored. We present an autoencoder-based method, THEA-code, aimed at efficiently generating IDS-correcting codes for complex IDS channels. In the work, a Gumbel-Softmax discretization constraint is proposed to discretize the features of the autoencoder, and a simulated differentiable IDS channel is developed as a differentiable alternative for IDS operations. These innovations facilitate the successful convergence of the autoencoder, resulting in channel-customized IDS-correcting codes with commendable performance across complex IDS channels.