🤖 AI Summary
This work studies input-dependent synchronization channels—where deletion/insertion error distributions depend on the entire input sequence—to more accurately model non-i.i.d. noise in practical systems such as DNA-based data storage.
Method: We integrate ergodic theory, information-theoretic analysis, the Pernice–Li–Wootters multi-trace coding framework, and the Brakensiek–Li–Spang robust coding technique.
Contribution/Results: We establish the first generalized capacity theorem for such channels and prove that the capacity is achievable by stationary ergodic sources. We derive sufficient conditions for capacity achievability under input-dependent synchronization errors, unifying the modeling of DNA storage and multi-trace channels. Furthermore, we construct the first explicit, capacity-achieving code for run-length-dependent deletion channels over multiple traces—provably approaching the theoretical capacity limit. This yields the first rigorously reliable coding scheme for DNA data storage.
📝 Abstract
"Independent and identically distributed"errors do not accurately capture the noisy behavior of real-world data storage and information transmission technologies. Motivated by this, we study channels with input-correlated synchronization errors, meaning that the distribution of synchronization errors (such as deletions and insertions) applied to the $i$-th input $x_i$ may depend on the whole input string $x$. We begin by identifying conditions on the input-correlated synchronization channel under which the channel's information capacity is achieved by a stationary ergodic input source and is equal to its coding capacity. These conditions capture a wide class of channels, including channels with correlated errors observed in DNA-based data storage systems and their multi-trace versions, and generalize prior work. To showcase the usefulness of the general capacity theorem above, we combine it with techniques of Pernice-Li-Wootters (ISIT 2022) and Brakensiek-Li-Spang (FOCS 2020) to obtain explicit capacity-achieving codes for multi-trace channels with runlength-dependent deletions, motivated by error patterns observed in DNA-based data storage systems.