Towards Foundation Models for Cryo-ET Subtomogram Analysis

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Cryo-electron tomography (cryo-ET) subtomogram classification, alignment, and averaging face three key bottlenecks: severe label scarcity, high noise levels, and poor generalizability. To address these challenges, we propose the first foundation model framework specifically designed for cryo-ET subtomogram analysis. Our method integrates: (1) CryoEngine—a large-scale synthetic data generator enabling realistic, diverse subtomogram synthesis; (2) APT-ViT—an adaptive phase-tokenized Vision Transformer that enhances structural feature representation via phase-aware tokenization; and (3) NRCL—a noise-robust contrastive learning strategy that improves discriminative capability under extreme noise and geometric/semantic variations. The framework demonstrates exceptional robustness to geometric deformations and semantic heterogeneity while achieving strong cross-dataset generalization. Evaluated on 24 synthetic and experimental datasets, it sets new state-of-the-art performance across all three tasks—significantly improving accuracy, robustness, and scalability of subtomogram analysis—and establishes a novel paradigm for high-throughput subcellular structure determination.

Technology Category

Application Category

📝 Abstract
Cryo-electron tomography (cryo-ET) enables in situ visualization of macromolecular structures, where subtomogram analysis tasks such as classification, alignment, and averaging are critical for structural determination. However, effective analysis is hindered by scarce annotations, severe noise, and poor generalization. To address these challenges, we take the first step towards foundation models for cryo-ET subtomograms. First, we introduce CryoEngine, a large-scale synthetic data generator that produces over 904k subtomograms from 452 particle classes for pretraining. Second, we design an Adaptive Phase Tokenization-enhanced Vision Transformer (APT-ViT), which incorporates adaptive phase tokenization as an equivariance-enhancing module that improves robustness to both geometric and semantic variations. Third, we introduce a Noise-Resilient Contrastive Learning (NRCL) strategy to stabilize representation learning under severe noise conditions. Evaluations across 24 synthetic and real datasets demonstrate state-of-the-art (SOTA) performance on all three major subtomogram tasks and strong generalization to unseen datasets, advancing scalable and robust subtomogram analysis in cryo-ET.
Problem

Research questions and friction points this paper is trying to address.

Addressing scarce annotations and noise in cryo-ET subtomogram analysis
Developing foundation models for classification, alignment, and averaging tasks
Enhancing generalization and robustness for macromolecular structure determination
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large-scale synthetic data generation for pretraining
Adaptive phase tokenization-enhanced vision transformer
Noise-resilient contrastive learning for stable representation
🔎 Similar Papers
No similar papers found.
R
Runmin Jiang
Carnegie Mellon University
W
Wanyue Feng
Carnegie Mellon University
Yuntian Yang
Yuntian Yang
Harvard University
Computational BiologyComputer Vision
S
Shriya Pingulkar
K. J. Somaiya College of Engineering
H
Hong Wang
University of Alabama at Birmingham
Xi Xiao
Xi Xiao
Oak Ridge National Laboratory | University of Alabama at Birmingham
LLM / MLLM EfficiencyImage / Video GenerationImage / Video Understanding
X
Xiaoyu Cao
Carnegie Mellon University
Genpei Zhang
Genpei Zhang
University of Electronic Science and Technology of China
Computer Vision
X
Xiao Wang
Oak Ridge National Laboratory
Xiaolong Wu
Xiaolong Wu
Georgia Institute of Technology
SLAMLocalizationRobotics
Tianyang Wang
Tianyang Wang
University of Alabama at Birmingham
machine learning (deep learning)computer vision
Y
Yang Liu
Carnegie Mellon University
X
Xingjian Li
Carnegie Mellon University
M
Min Xu
Carnegie Mellon University