🤖 AI Summary
RNA language models exhibit inconsistent performance across secondary structure prediction and functional classification tasks, and lack a unified evaluation benchmark. Method: We systematically evaluate the zero-shot generalization capabilities of 13 RNA language models—categorized by modeling scope into three groups—and include DNA and protein language models as cross-modal baselines. We introduce the first unified benchmark covering both structural and functional tasks. Contribution/Results: Our analysis reveals a significant performance trade-off: models excelling at long-range base-pair modeling achieve superior structural prediction accuracy but underperform on functional classification, and vice versa. This indicates a critical deficiency in task balance within current unsupervised pretraining paradigms. The study provides key empirical evidence and a methodological framework to guide the design, evaluation, and task-specific adaptation of RNA language models, advancing principled development in computational RNA biology.
📝 Abstract
Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or proteins or both) and compared 13 RNA LMs along with 3 DNA and 1 protein LMs as controls in zero-shot prediction of RNA secondary structure and functional classification. Results shows that the models doing well on secondary structure prediction often perform worse in function classification or vice versa, suggesting that more balanced unsupervised training is needed.