Proteina: Scaling Flow-based Protein Structure Generative Models

📅 2025-03-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of insufficient diversity, weak controllability, and poor scalability in *de novo* backbone design for long-chain proteins (≤800 residues). To this end, we propose Proteina, a flow-based generative model. Methodologically: (1) we introduce a novel hierarchical folding-class conditioning mechanism, enabling macro-level control at the secondary-structure level and micro-level generation tailored to specific folding motifs; (2) we design a scalable Transformer architecture—five times larger in parameter count than current SOTA models—augmented with backbone-specific LoRA fine-tuning, classifier-free guidance, and autoguidance; (3) the model is trained at scale on a million-synthetic-structure dataset. Experiments demonstrate that Proteina achieves state-of-the-art performance in backbone design, significantly improving generated chain length, structural diversity, and experimental tractability. It establishes a new paradigm for rational design of long-chain proteins.

Technology Category

Application Category

📝 Abstract
Recently, diffusion- and flow-based generative models of protein structures have emerged as a powerful tool for de novo protein design. Here, we develop Proteina, a new large-scale flow-based protein backbone generator that utilizes hierarchical fold class labels for conditioning and relies on a tailored scalable transformer architecture with up to 5x as many parameters as previous models. To meaningfully quantify performance, we introduce a new set of metrics that directly measure the distributional similarity of generated proteins with reference sets, complementing existing metrics. We further explore scaling training data to millions of synthetic protein structures and explore improved training and sampling recipes adapted to protein backbone generation. This includes fine-tuning strategies like LoRA for protein backbones, new guidance methods like classifier-free guidance and autoguidance for protein backbones, and new adjusted training objectives. Proteina achieves state-of-the-art performance on de novo protein backbone design and produces diverse and designable proteins at unprecedented length, up to 800 residues. The hierarchical conditioning offers novel control, enabling high-level secondary-structure guidance as well as low-level fold-specific generation.
Problem

Research questions and friction points this paper is trying to address.

Develops Proteina for scalable protein backbone generation.
Introduces new metrics for protein distribution similarity.
Explores training strategies for improved protein design.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical fold class labels for conditioning
Scalable transformer architecture with 5x parameters
New metrics for distributional similarity measurement
🔎 Similar Papers
No similar papers found.