🤖 AI Summary
This work determines the minimum number of equal-length DNA (or $q$-ary) labels required to achieve maximum labeling capacity. The problem is rigorously formulated as an extremal edge-count problem for path-unique maximal edge subgraphs of de Bruijn graphs—establishing, for the first time, a strict equivalence between optimal DNA labeling capacity and extremal de Bruijn subgraph theory, thereby bridging combinatorial coding theory and molecular labeling practice. Combining combinatorial graph theory, de Bruijn graph analysis, path-uniqueness characterization, extremal graph theory, and information-theoretic techniques, we derive tight upper and lower bounds on the minimum label count. Our results yield information-theoretically optimal design principles for high-density DNA barcodes and single-molecule imaging, and provide fundamental theoretical limits for molecular recognition systems.
📝 Abstract
DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molec-ular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(ell+1)$-ary sequence, where $ell$ is the number of labels, in which any nonzero symbol indicates the appearance of the corresponding label in the DNA molecule. The labeling capacity refers to the maximum information rate that can be achieved by the labeling process for any given set of labels. The main goal of this paper is to study the minimum number of labels of the same length required to achieve the maximum labeling capacity of 2 for DNA sequences or $log_{2}q$ for an arbitrary alphabet of size $q$. The solution to this problem requires the study of path unique subgraphs of the de Bruijn graph with the largest number of edges. We provide upper and lower bounds on this value.