RNACG: A Universal RNA Sequence Conditional Generation model based on Flow-Matching

📅 2024-07-29
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RNA design faces challenges including high structural flexibility and scarcity of experimental 3D structures, making it difficult for existing methods to jointly satisfy diverse constraints—such as phylogenetic family membership, secondary/tertiary structural specifications, and functional site requirements—in a unified generative framework. To address this, we propose the first flow matching–based universal conditional generation framework for RNA sequence design. Our approach features a modular conditional encoder that enables discrete sequence representation and unifies multiple design paradigms—including family-specific generation, structure-constrained design, and binding-site–guided inverse folding. The framework significantly improves conditional controllability and cross-task generalization. It achieves state-of-the-art performance on Rfam family generation, secondary structure design, and PDB binding-site inverse folding, producing sequences with both high structural validity and functional compatibility.

Technology Category

Application Category

📝 Abstract
RNA plays a pivotal role in diverse biological processes, ranging from gene regulation to catalysis. Recent advances in RNA design, such as RfamGen, Ribodiffusion and RDesign, have demonstrated promising results, with successful designs of functional sequences. However, RNA design remains challenging due to the inherent flexibility of RNA molecules and the scarcity of experimental data on tertiary and secondary structures compared to proteins. These limitations highlight the need for a more universal and comprehensive approach to RNA design that integrates diverse annotation information at the sequence level. To address these challenges, we propose RNACG (RNA Conditional Generator), a universal framework for RNA sequence design based on flow matching. RNACG supports diverse conditional inputs, including structural, functional, and family-specific annotations, and offers a modular design that allows users to customize the encoding network for specific tasks. By unifying sequence generation under a single framework, RNACG enables the integration of multiple RNA design paradigms, from family-specific generation to tertiary structure inverse folding.
Problem

Research questions and friction points this paper is trying to address.

RNA design challenges due to molecular flexibility
Scarcity of experimental data on RNA structures
Need for universal RNA sequence generation framework
Innovation

Methods, ideas, or system contributions that make the work stand out.

Universal RNA sequence design framework
Flow-matching based conditional generation
Modular encoding for diverse RNA annotations
🔎 Similar Papers
No similar papers found.
L
Letian Gao
MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China; Institute for Precision Medicine, Tsinghua University, Beijing 100084, China
Zhi John Lu
Zhi John Lu
MOE Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, School of Life Sciences, Tsinghua University, Beijing 100084, China; Institute for Precision Medicine, Tsinghua University, Beijing 100084, China