GENEB: Why Genomic Models Are Hard to Compare

πŸ“… 2026-06-03
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

188K/year
πŸ€– AI Summary
This study addresses the lack of standardized evaluation protocols that hinder fair assessment of genomic foundation models’ performance and generalization. To this end, the authors introduce GENEB, a large-scale diagnostic benchmark that systematically evaluates frozen representations from 40 models across 100 tasks under a unified probing protocol, spanning 13 functional categories and supporting few-shot settings. This framework enables, for the first time, category-aware, fine-grained, and controllable multidimensional comparisons, revealing the instability of aggregate leaderboards and inherent trade-offs across tasks. Key findings indicate substantial variation in model rankings across functional categories, limited and inconsistent gains from increased model scale, and a more decisive influence of architectural design and alignment between pretraining data and downstream tasks than parameter count alone.
πŸ“ Abstract
Progress in genomic foundation models is difficult to assess due to fragmented benchmarks, incompatible evaluation protocols, and task-specific reporting. As a result, claims of superiority or generality across models are often not directly comparable. We introduce GENEB, a large-scale diagnostic benchmark that evaluates frozen representations from 40 genomic foundation models across 100 tasks spanning 13 functional categories under a unified probing-based protocol, including few-shot regimes. GENEB enables controlled comparison across model scale, architecture, tokenization, and pretraining data while explicitly exposing task-level trade-offs. Our analysis shows that aggregate leaderboards are unstable: model rankings vary sharply across task categories, scale provides only modest and inconsistent gains, and architectural and pretraining alignment frequently outweigh parameter count. These results highlight limitations of current evaluation practices and position GENEB as a reference framework for principled comparison and category-aware model selection in genomic machine learning.
Problem

Research questions and friction points this paper is trying to address.

genomic foundation models
benchmarking
model comparison
evaluation protocols
representation evaluation
Innovation

Methods, ideas, or system contributions that make the work stand out.

genomic foundation models
benchmarking
probing-based evaluation
few-shot learning
model comparison