🤖 AI Summary
DNA-based data storage faces fundamental limitations in high synthesis costs and low coding density (e.g., DNA Punchcards store only 1 bit per nick site). To address this, we propose DNA Tails—a novel molecular encoding paradigm that leverages enzymatic synthesis of variable-length single-stranded DNA tails at backbone nick sites, enabling non-binary, multi-bit-per-site storage and substantially increasing molecular-level storage density. Our key contributions are: (1) the first introduction of tail-length modulation as a coding mechanism; (2) the design of rank-modulation and permutation codes robust against “sticking” errors—specifically calibrated to correct both calibration drift and truncation-induced growth errors; and (3) an optimal construction of redundant permutation codes with efficient encoding/decoding algorithms. Experimental validation confirms the feasibility of the encoding scheme and demonstrates strong error resilience under realistic biochemical constraints.
📝 Abstract
DNA-based data storage systems face practical challenges due to the high cost of DNA synthesis. A strategy to address the problem entails encoding data via topological modifications of the DNA sugar-phosphate backbone. The DNA Punchcards system, which introduces nicks (cuts) in the DNA backbone, encodes only one bit per nicking site, limiting density. We propose emph{DNA Tails,} a storage paradigm that encodes nonbinary symbols at nicking sites by growing enzymatically synthesized single-stranded DNA of varied lengths. The average tail lengths encode multiple information bits and are controlled via a staggered nicking-tail extension process. We demonstrate the feasibility of this encoding approach experimentally and identify common sources of errors, such as calibration errors and stumped tail growth errors. To mitigate calibration errors, we use rank modulation proposed for flash memory. To correct stumped tail growth errors, we introduce a new family of rank modulation codes that can correct ``stuck-at'' errors. Our analytical results include constructions for order-optimal-redundancy permutation codes and accompanying encoding and decoding algorithms.