🤖 AI Summary
This work addresses the systematic identification of minimal undesignable motifs in RNA secondary structures—i.e., the smallest substructures that cannot be realized by any sequence under the standard Turner energy model—aiming to uncover local structural origins of global undesignability.
Method: We introduce a novel theoretical framework based on competitive motif search, employing loop-pair graph representations and recursive graph isomorphism testing, augmented with rotational invariance to enable motif equivalence grouping and reuse. This yields the first scalable, interpretable database of minimal undesignable motifs.
Results: We identify 24 unique minimal undesignable motifs in Eterna100; in ArchiveII, we discover over 350 previously unreported motifs and 663 naturally occurring undesignable structures containing them. These findings provide critical theoretical tools and empirical foundations for refining RNA folding models and rigorously characterizing the fundamental limits of RNA sequence design.
📝 Abstract
RNA design aims to find a sequence that folds with highest probability into a designated target structure. However, certain structures are undesignable, meaning no sequence can fold into the target structure under the default (Turner) RNA folding model. Understanding the specific local structures (i.e.,"motifs") that contribute to undesignability is crucial for refining RNA folding models and determining the limits of RNA designability. Despite its importance, this problem has received very little attention, and previous efforts are neither scalable nor interpretable. We develop a new theoretical framework for motif (un-)designability, and design scalable and interpretable algorithms to identify minimal undesignable motifs within a given RNA secondary structure. Our approach establishes motif undesignability by searching for rival motifs, rather than exhaustively enumerating all (partial) sequences that could potentially fold into the motif. Furthermore, we exploit rotational invariance in RNA structures to detect, group, and reuse equivalent motifs and to construct a database of unique minimal undesignable motifs. To achieve that, we propose a loop-pair graph representation for motifs and a recursive graph isomorphism algorithm for motif equivalence. Our algorithms successfully identify 24 unique minimal undesignable motifs among 18 undesignable puzzles from the Eterna100 benchmark. Surprisingly, we also find over 350 unique minimal undesignable motifs and 663 undesignable native structures in the ArchiveII dataset, drawn from a diverse set of RNA families. Our source code is available at https://github.com/shanry/RNA-Undesign and our web server is available at http://linearfold.org/motifs.