CryoProt: A Protein Pretraining Framework with Cross-Box Interactions on Cryo-EM Density Maps

📅 2026-05-30
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
🤖 AI Summary
This work addresses the absence of a general-purpose pretraining framework for cryo-electron microscopy (cryo-EM) protein characterization and the difficulty in capturing global dependencies among local regions within density maps. To this end, the authors propose CryoProt, a novel framework that introduces, for the first time, a cross-patch interaction mechanism to explicitly model the global structural relationships of cryo-EM density maps in latent space. By integrating multi-head implicit attention with multi-task self-supervised pretraining, CryoProt learns transferable protein representations without requiring density map inputs during downstream inference. The method consistently outperforms existing approaches across multiple benchmarks, achieving performance gains of up to 12%, thereby demonstrating the effectiveness and necessity of modeling inter-regional interactions for cryo-EM data representation.
📝 Abstract
Despite the growing availability of cryo-electron microscopy (cryo-EM) density maps, effectively leveraging them for protein representation remains challenging. First, current methods lack a general-purpose protein pretraining framework tailored for cryo-EM density maps, designed for protein-related property prediction. Second, existing approaches typically partition density maps into local box regions and model them independently, overlooking interactions across boxes which are essential for capturing global structural context in cryo-EM density map. To address these challenges, we propose CryoProt, a protein pretraining framework designed for cryo-EM density maps. CryoProt introduces a Map Encoder based on multi-head latent attention (MLA), where box-level representations interact through a shared latent space, enabling explicit modeling of cross-box dependencies within the density map. Furthermore, we adopt a multi-task pretraining strategy to learn generalizable representations that can be effectively transferred to diverse downstream tasks, such as protein flexibility prediction, where cryo-EM density maps are not required and can be inferred implicitly by the pretrained model. Experimental results demonstrate that CryoProt consistently outperforms existing state-of-the-art methods across multiple benchmarks, achieving up to 12% improvement over the best-performing baselines, highlighting the importance of modeling cross-box interactions in cryo-EM data. The source code is publicly available at https://anonymous.4open.science/r/CryoProt.
Problem

Research questions and friction points this paper is trying to address.

cryo-EM density maps
protein representation
cross-box interactions
pretraining framework
global structural context
Innovation

Methods, ideas, or system contributions that make the work stand out.

Cryo-EM density maps
cross-box interactions
multi-head latent attention
protein pretraining
multi-task learning
🔎 Similar Papers