Learning Implicit Bias in Generative Spaces for Accelerating Protein Dynamics Emulation

📅 2026-06-01

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the challenge that generative models for protein dynamics often fail to explore rare conformational states during long-timescale extrapolation, becoming trapped in known structures. The authors introduce, for the first time, an enhanced sampling strategy operating in the latent space of a pre-trained generative model without requiring fine-tuning. Their approach employs a history-aware score estimator that applies distance-weighted bias to steer sampling away from previously visited conformations, while preserving structural plausibility through an environment-aware regularization term and a score-guided manifold projection for reconstruction. Evaluated on DynamicPDB-80, the method increases trajectory diversity by 35%. On a set of 12 fast-folding proteins, it achieves a 15-fold acceleration in reaching baseline conformational coverage using bias alone, and a 37-fold speedup when combined with reconstruction, while accessing approximately three times more low-energy states.

📝 Abstract

Generative emulators of protein dynamics produce plausible trajectories at a fraction of the cost of molecular dynamics, but they inherit their training distribution and tend to revisit known states rather than reach rare ones under long-horizon extrapolation. Inspired by classical enhanced sampling, we introduce an implicit, history-dependent bias in the generative space of a pretrained emulator. Specifically, a history-aware score estimator augments the frozen emulator with a distance-weighted bias that steers reverse-time sampling away from previously generated structures, regularized by an environment-support term. To preserve structural validity at long horizons, a score-based refinement step re-projects drifted samples onto the data manifold using the frozen emulator. Our experiments demonstrate that the method (i) raises diversity by $35\%$ on DynamicPDB-80; (ii) on $12$ zero-shot Fast-Folding proteins, the learned bias alone reaches the unbiased emulator's coverage up to ${\sim}15\times$ faster, and pairing it with refinement reaches the coverage up to ${\sim}37\times$ faster while covering ${\sim}3\times$ as many low-energy states. Code will be released soon.

Problem

Research questions and friction points this paper is trying to address.

protein dynamics

generative emulation

rare states

sampling bias

long-horizon extrapolation

Innovation

Methods, ideas, or system contributions that make the work stand out.

implicit bias

generative emulation

history-dependent sampling