A Comparative Review of RNA Language Models

📅 2025-05-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
RNA language models exhibit inconsistent performance across secondary structure prediction and functional classification tasks, and lack a unified evaluation benchmark. Method: We systematically evaluate the zero-shot generalization capabilities of 13 RNA language models—categorized by modeling scope into three groups—and include DNA and protein language models as cross-modal baselines. We introduce the first unified benchmark covering both structural and functional tasks. Contribution/Results: Our analysis reveals a significant performance trade-off: models excelling at long-range base-pair modeling achieve superior structural prediction accuracy but underperform on functional classification, and vice versa. This indicates a critical deficiency in task balance within current unsupervised pretraining paradigms. The study provides key empirical evidence and a methodological framework to guide the design, evaluation, and task-specific adaptation of RNA language models, advancing principled development in computational RNA biology.

Technology Category

Application Category

📝 Abstract
Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or proteins or both) and compared 13 RNA LMs along with 3 DNA and 1 protein LMs as controls in zero-shot prediction of RNA secondary structure and functional classification. Results shows that the models doing well on secondary structure prediction often perform worse in function classification or vice versa, suggesting that more balanced unsupervised training is needed.
Problem

Research questions and friction points this paper is trying to address.

Comparing RNA language models for structure and function prediction
Evaluating 13 RNA models with DNA and protein controls
Identifying trade-offs between structure and function performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Classified RNA LMs into three distinct categories
Compared 13 RNA LMs with DNA and protein controls
Highlighted need for balanced unsupervised training
🔎 Similar Papers
No similar papers found.
H
He Wang
Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China
Y
Yikun Zhang
Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China; School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
J
Jie Chen
School of Electronic and Computer Engineering, Peking University, Shenzhen 518055, China
J
Jian Zhan
Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen 518107, China; Ribopeutic Inc, Guangzhou International Bio Island, Guangdong 510320, China
Yaoqi Zhou
Yaoqi Zhou
Senior Principal Investigator, Institute of Systems & Physical Biology, Shenzhen Bay Laboratory
Computational BiologyBioinformaticsBiophysicsMolecular BiologyCellular Biology