Efficient Transformer-based Decoder for Varshamov-Tenengolts Codes

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional VT-code decoders for DNA storage support only single insertion/deletion/substitution (IDS) error correction, failing to handle multiple IDS errors—a critical bottleneck for reliable DNA data storage. To address this, we propose TVTD, the first Transformer-based soft decoder for VT codes. TVTD innovatively integrates Transformer architecture into VT decoding, jointly modeling symbol-level and statistical-level embeddings within a soft-input soft-output (SISO) framework to enable robust joint correction of multiple IDS errors. Experiments demonstrate 100% single-error correction accuracy; significantly lower bit/frame error rates under multiple IDS errors compared to state-of-the-art hard-decision and soft-decision baselines; and a 10× speedup in inference latency. This work breaks the long-standing single-error limitation of VT codes, establishing the first scalable, deep learning–enabled soft decoding paradigm for high-reliability DNA storage.

Technology Category

Application Category

📝 Abstract
In recent years, the rise of DNA data storage technology has brought significant attention to the challenge of correcting insertion, deletion, and substitution (IDS) errors. Among various coding methods for IDS correction, Varshamov-Tenengolts (VT) codes, primarily designed for single-error correction, have emerged as a central research focus. While existing decoding methods achieve high accuracy in correcting a single error, they often fail to correct multiple IDS errors. In this work, we observe that VT codes retain some capability for addressing multiple errors by introducing a transformer-based VT decoder (TVTD) along with symbol- and statistic-based codeword embedding. Experimental results demonstrate that the proposed TVTD achieves perfect correction of a single error. Furthermore, when decoding multiple errors across various codeword lengths, the bit error rate and frame error rate are significantly improved compared to existing hard decision and soft-in soft-out algorithms. Additionally, through model architecture optimization, the proposed method reduces time consumption by an order of magnitude compared to other soft decoders.
Problem

Research questions and friction points this paper is trying to address.

Correcting multiple insertion, deletion, and substitution errors in DNA data storage.
Enhancing Varshamov-Tenengolts codes for multiple error correction using transformer-based decoders.
Reducing time consumption and improving error rates in VT code decoding.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based VT decoder for multiple IDS errors
Symbol- and statistic-based codeword embedding
Model optimization reduces time consumption significantly
🔎 Similar Papers
No similar papers found.
Y
Yali Wei
Center for Applied Mathematics, Tianjin University, Tianjin, China
Alan J.X. Guo
Alan J.X. Guo
Center for Applied Mathematics, Tianjin Univ.
CombinatoricsDeep Learning
Zihui Yan
Zihui Yan
Jiangnan University
Low-level vision
Y
Yufan Dai
Center for Applied Mathematics, Tianjin University, Tianjin, China