Achieving DNA Labeling Capacity with Minimum Labels through Extremal de Bruijn Subgraphs

📅 2024-01-28

🏛️ International Symposium on Information Theory

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work determines the minimum number of equal-length DNA (or $q$-ary) labels required to achieve maximum labeling capacity. The problem is rigorously formulated as an extremal edge-count problem for path-unique maximal edge subgraphs of de Bruijn graphs—establishing, for the first time, a strict equivalence between optimal DNA labeling capacity and extremal de Bruijn subgraph theory, thereby bridging combinatorial coding theory and molecular labeling practice. Combining combinatorial graph theory, de Bruijn graph analysis, path-uniqueness characterization, extremal graph theory, and information-theoretic techniques, we derive tight upper and lower bounds on the minimum label count. Our results yield information-theoretically optimal design principles for high-density DNA barcodes and single-molecule imaging, and provide fundamental theoretical limits for molecular recognition systems.

Technology Category

Application Category

📝 Abstract

DNA labeling is a tool in molecular biology and biotechnology to visualize, detect, and study DNA at the molec-ular level. In this process, a DNA molecule is labeled by a set of specific patterns, referred to as labels, and is then imaged. The resulting image is modeled as an $(ell+1)$-ary sequence, where $ell$ is the number of labels, in which any nonzero symbol indicates the appearance of the corresponding label in the DNA molecule. The labeling capacity refers to the maximum information rate that can be achieved by the labeling process for any given set of labels. The main goal of this paper is to study the minimum number of labels of the same length required to achieve the maximum labeling capacity of 2 for DNA sequences or $log_{2}q$ for an arbitrary alphabet of size $q$. The solution to this problem requires the study of path unique subgraphs of the de Bruijn graph with the largest number of edges. We provide upper and lower bounds on this value.

Problem

Research questions and friction points this paper is trying to address.

Minimizing labels for maximum DNA labeling capacity

Studying path unique de Bruijn subgraphs for optimal labeling

Establishing bounds on label numbers for asymptotic efficiency

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses extremal de Bruijn subgraphs for DNA labeling

Minimizes labels while maximizing labeling capacity

Provides bounds on label count for optimal capacity

🔎 Similar Papers

No similar papers found.

Authors to Follow