Beyond Conditional Computation: Retrieval-Augmented Genomic Foundation Models with Gengram

πŸ“… 2026-01-29
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the inefficiency and poor interpretability of existing genomic foundation models, which rely on implicit learning of conserved biological motifs. The authors propose Gengram, a novel module that introduces structured motif memory as a modeling paradigm, explicitly retrieving multi-nucleotide motifs through genome-specific hash encoding and a conditional memory mechanism to construct genomic β€œgrammar.” Integrated into mainstream genomic foundation model backbones, Gengram enables an efficient, biologically aligned retrieval-augmented architecture. Evaluated across multiple functional genomics tasks, the approach achieves performance gains of up to 14% while producing representations that align closely with established biological knowledge, thereby significantly enhancing both model generalization and mechanistic interpretability.

Technology Category

Application Category

πŸ“ Abstract
Current genomic foundation models (GFMs) rely on extensive neural computation to implicitly approximate conserved biological motifs from single-nucleotide inputs. We propose Gengram, a conditional memory module that introduces an explicit and highly efficient lookup primitive for multi-base motifs via a genomic-specific hashing scheme, establishing genomic"syntax". Integrated into the backbone of state-of-the-art GFMs, Gengram achieves substantial gains (up to 14%) across several functional genomics tasks. The module demonstrates robust architectural generalization, while further inspection of Gengram's latent space reveals the emergence of meaningful representations that align closely with fundamental biological knowledge. By establishing structured motif memory as a modeling primitive, Gengram simultaneously boosts empirical performance and mechanistic interpretability, providing a scalable and biology-aligned pathway for the next generation of GFMs. The code is available at https://github.com/zhejianglab/Genos, and the model checkpoint is available at https://huggingface.co/ZhejiangLab/Gengram.
Problem

Research questions and friction points this paper is trying to address.

genomic foundation models
biological motifs
conditional computation
functional genomics
motif representation
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented
genomic foundation models
motif memory
conditional computation
genomic hashing
πŸ”Ž Similar Papers
No similar papers found.
H
Huinan Xu
Genos Team
X
Xuyang Feng
Genos Team
J
Junhong Chen
Genos Team
Junchen Liu
Junchen Liu
University of Texas Medical School, Houston, TX
cancer biology
K
Kaiwen Deng
Genos Team
K
Kai Ding
Genos Team
S
Shengning Long
Genos Team
J
Jiaxue Shuai
Genos Team
Zhaorong Li
Zhaorong Li
Alibaba Cloud
Computational Biology
S
Shiping Liu
Genos Team
G
Guirong Xue
Genos Team
Z
Zhan Xiao
Genos Team