🤖 AI Summary
This work addresses the challenge of modeling protein structural dynamics by proposing the first SE(3)-covariant flow matching framework for generating physically plausible, functionally relevant conformational ensembles—not isolated static structures. Methodologically: (i) it defines an SE(3)-equivariant flow process to rigorously preserve rotational and translational symmetry; (ii) it incorporates physical priors—including covalent bond length and dihedral angle constraints—to guide generation toward chemically valid conformations; and (iii) it introduces auxiliary latent dimensions explicitly encoding ensemble-level distributional characteristics. Trained on the ATLAS molecular dynamics dataset, the model significantly outperforms existing baselines in conformational diversity, crystal structure recapitulation fidelity, and accuracy of MD trajectory fluctuation modeling. It successfully generates high-quality, functionally interpretable dynamic conformational ensembles, advancing structure-based functional analysis and drug design.
📝 Abstract
Biological processes, functions, and properties are intricately linked to the ensemble of protein conformations, rather than being solely determined by a single stable conformation. In this study, we have developed P2DFlow, a generative model based on SE(3) flow matching, to predict the structural ensembles of proteins. We specifically designed a valuable prior for the flow process and enhanced the model's ability to distinguish each intermediate state by incorporating an additional dimension to describe the ensemble data, which can reflect the physical laws governing the distribution of ensembles, so that the prior knowledge can effectively guide the generation process. When trained and evaluated on the MD datasets of ATLAS, P2DFlow outperforms other baseline models on extensive experiments, successfully capturing the observable dynamic fluctuations as evidenced in crystal structure and MD simulations. As a potential proxy agent for protein molecular simulation, the high-quality ensembles generated by P2DFlow could significantly aid in understanding protein functions across various scenarios. Code is available at https://github.com/BLEACH366/P2DFlow