Learning Implicit Bias in Generative Spaces for Accelerating Protein Dynamics Emulation

📅 2026-06-01
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses the challenge that generative models for protein dynamics often fail to explore rare conformational states during long-timescale extrapolation, becoming trapped in known structures. The authors introduce, for the first time, an enhanced sampling strategy operating in the latent space of a pre-trained generative model without requiring fine-tuning. Their approach employs a history-aware score estimator that applies distance-weighted bias to steer sampling away from previously visited conformations, while preserving structural plausibility through an environment-aware regularization term and a score-guided manifold projection for reconstruction. Evaluated on DynamicPDB-80, the method increases trajectory diversity by 35%. On a set of 12 fast-folding proteins, it achieves a 15-fold acceleration in reaching baseline conformational coverage using bias alone, and a 37-fold speedup when combined with reconstruction, while accessing approximately three times more low-energy states.
📝 Abstract
Generative emulators of protein dynamics produce plausible trajectories at a fraction of the cost of molecular dynamics, but they inherit their training distribution and tend to revisit known states rather than reach rare ones under long-horizon extrapolation. Inspired by classical enhanced sampling, we introduce an implicit, history-dependent bias in the generative space of a pretrained emulator. Specifically, a history-aware score estimator augments the frozen emulator with a distance-weighted bias that steers reverse-time sampling away from previously generated structures, regularized by an environment-support term. To preserve structural validity at long horizons, a score-based refinement step re-projects drifted samples onto the data manifold using the frozen emulator. Our experiments demonstrate that the method (i) raises diversity by $35\%$ on DynamicPDB-80; (ii) on $12$ zero-shot Fast-Folding proteins, the learned bias alone reaches the unbiased emulator's coverage up to ${\sim}15\times$ faster, and pairing it with refinement reaches the coverage up to ${\sim}37\times$ faster while covering ${\sim}3\times$ as many low-energy states. Code will be released soon.
Problem

Research questions and friction points this paper is trying to address.

protein dynamics
generative emulation
rare states
sampling bias
long-horizon extrapolation
Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit bias
generative emulation
history-dependent sampling
score-based refinement
protein dynamics
K
Kaihui Cheng
Fudan University; Shanghai Academy of AI for Science
Z
Zhiqiang Cai
Shanghai Academy of AI for Science
W
Wenkai Xiang
Shanghai Academy of AI for Science
Z
Zhihang Hu
Shanghai Academy of AI for Science
Siyu Zhu
Siyu Zhu
LinkedIn
LLM | Ranking
T
Tzuhsiung Yang
Shanghai Academy of AI for Science
Y
Yuan Qi
Fudan University; Shanghai Academy of AI for Science