P2DFlow: A Protein Ensemble Generative Model with SE(3) Flow Matching

📅 2024-11-26
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of modeling protein structural dynamics by proposing the first SE(3)-covariant flow matching framework for generating physically plausible, functionally relevant conformational ensembles—not isolated static structures. Methodologically: (i) it defines an SE(3)-equivariant flow process to rigorously preserve rotational and translational symmetry; (ii) it incorporates physical priors—including covalent bond length and dihedral angle constraints—to guide generation toward chemically valid conformations; and (iii) it introduces auxiliary latent dimensions explicitly encoding ensemble-level distributional characteristics. Trained on the ATLAS molecular dynamics dataset, the model significantly outperforms existing baselines in conformational diversity, crystal structure recapitulation fidelity, and accuracy of MD trajectory fluctuation modeling. It successfully generates high-quality, functionally interpretable dynamic conformational ensembles, advancing structure-based functional analysis and drug design.

Technology Category

Application Category

📝 Abstract
Biological processes, functions, and properties are intricately linked to the ensemble of protein conformations, rather than being solely determined by a single stable conformation. In this study, we have developed P2DFlow, a generative model based on SE(3) flow matching, to predict the structural ensembles of proteins. We specifically designed a valuable prior for the flow process and enhanced the model's ability to distinguish each intermediate state by incorporating an additional dimension to describe the ensemble data, which can reflect the physical laws governing the distribution of ensembles, so that the prior knowledge can effectively guide the generation process. When trained and evaluated on the MD datasets of ATLAS, P2DFlow outperforms other baseline models on extensive experiments, successfully capturing the observable dynamic fluctuations as evidenced in crystal structure and MD simulations. As a potential proxy agent for protein molecular simulation, the high-quality ensembles generated by P2DFlow could significantly aid in understanding protein functions across various scenarios. Code is available at https://github.com/BLEACH366/P2DFlow
Problem

Research questions and friction points this paper is trying to address.

Predict protein structural ensembles using SE(3) flow matching.
Enhance ensemble generation with physical law-based prior knowledge.
Improve understanding of protein functions via dynamic fluctuation capture.
Innovation

Methods, ideas, or system contributions that make the work stand out.

SE(3) flow matching for protein ensemble generation
Enhanced prior knowledge integration in flow process
Additional dimension for improved ensemble state distinction
🔎 Similar Papers
No similar papers found.
Y
Yaowei Jin
Lingang Laboratory, Shanghai 200031, China.
Q
Qi Huang
Institute for Electric Light Sources, School of Information Science and Technology, Fudan University, Shanghai 200438, P. R. China.
Z
Ziyang Song
Shanghai Key Lab of Chemical Assessment and Sustainability, School of Chemical Science and Engineering, Tongji University, Shanghai 200092, P. R. China.
Mingyue Zheng
Mingyue Zheng
Shanghai Institute of Materia Medica, Chinese Academy of Sciences
Drug DiscoveryDeep LearningAI for ScienceMolecular DesignComputational Biology
D
Dan Teng
Drug Discovery and Design Center, State Key Laboratory of Drug Research, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai 201203, China.
Q
Qian Shi
Lingang Laboratory, Shanghai 200031, China.