Beyond Conditional Computation: Retrieval-Augmented Genomic Foundation Models with Gengram

📅 2026-01-29

📈 Citations: 0

✨ Influential: 0

career value

208K/year

🤖 AI Summary

This work addresses the inefficiency and poor interpretability of existing genomic foundation models, which rely on implicit learning of conserved biological motifs. The authors propose Gengram, a novel module that introduces structured motif memory as a modeling paradigm, explicitly retrieving multi-nucleotide motifs through genome-specific hash encoding and a conditional memory mechanism to construct genomic “grammar.” Integrated into mainstream genomic foundation model backbones, Gengram enables an efficient, biologically aligned retrieval-augmented architecture. Evaluated across multiple functional genomics tasks, the approach achieves performance gains of up to 14% while producing representations that align closely with established biological knowledge, thereby significantly enhancing both model generalization and mechanistic interpretability.

Technology Category

Application Category

📝 Abstract

Current genomic foundation models (GFMs) rely on extensive neural computation to implicitly approximate conserved biological motifs from single-nucleotide inputs. We propose Gengram, a conditional memory module that introduces an explicit and highly efficient lookup primitive for multi-base motifs via a genomic-specific hashing scheme, establishing genomic"syntax". Integrated into the backbone of state-of-the-art GFMs, Gengram achieves substantial gains (up to 14%) across several functional genomics tasks. The module demonstrates robust architectural generalization, while further inspection of Gengram's latent space reveals the emergence of meaningful representations that align closely with fundamental biological knowledge. By establishing structured motif memory as a modeling primitive, Gengram simultaneously boosts empirical performance and mechanistic interpretability, providing a scalable and biology-aligned pathway for the next generation of GFMs. The code is available at https://github.com/zhejianglab/Genos, and the model checkpoint is available at https://huggingface.co/ZhejiangLab/Gengram.

Problem

Research questions and friction points this paper is trying to address.

genomic foundation models

biological motifs

conditional computation

functional genomics

motif representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented

genomic foundation models

motif memory