Lattice Protein Folding with Variational Annealing

📅 2025-02-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
The two-dimensional HP lattice protein folding problem is an NP-hard combinatorial optimization task. This paper introduces the Masked Variational Annealing (MVA) framework, which models the sequence-to-structure mapping via a sparse recurrent neural network (RNN), integrates temperature-driven variational annealing sampling with an energy-guided dynamic masking mechanism, and explicitly excludes invalid conformations during autoregressive generation. We propose a novel upper-bound-guided masking training strategy that preserves RNN representational capacity while enhancing search efficiency and conformational feasibility. The method naturally generalizes to three-dimensional and multi-letter alphabets. On the standard 60-residue benchmark set, MVA achieves, for the first time, exact prediction of all known optimal conformations—substantially outperforming conventional heuristic algorithms and state-of-the-art learning-based approaches.

Technology Category

Application Category

📝 Abstract
Understanding the principles of protein folding is a cornerstone of computational biology, with implications for drug design, bioengineering, and the understanding of fundamental biological processes. Lattice protein folding models offer a simplified yet powerful framework for studying the complexities of protein folding, enabling the exploration of energetically optimal folds under constrained conditions. However, finding these optimal folds is a computationally challenging combinatorial optimization problem. In this work, we introduce a novel upper-bound training scheme that employs masking to identify the lowest-energy folds in two-dimensional Hydrophobic-Polar (HP) lattice protein folding. By leveraging Dilated Recurrent Neural Networks (RNNs) integrated with an annealing process driven by temperature-like fluctuations, our method accurately predicts optimal folds for benchmark systems of up to 60 beads. Our approach also effectively masks invalid folds from being sampled without compromising the autoregressive sampling properties of RNNs. This scheme is generalizable to three spatial dimensions and can be extended to lattice protein models with larger alphabets. Our findings emphasize the potential of advanced machine learning techniques in tackling complex protein folding problems and a broader class of constrained combinatorial optimization challenges.
Problem

Research questions and friction points this paper is trying to address.

Develops a method to find lowest-energy protein folds in 2D HP lattice models.
Uses Dilated RNNs with annealing to predict optimal folds for up to 60 beads.
Generalizes approach to 3D and larger alphabet lattice protein models.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Upper-bound training scheme with masking
Dilated RNNs integrated with annealing process
Autoregressive sampling without invalid folds
🔎 Similar Papers
No similar papers found.
S
Shoummo Ahsan Khandoker
Department of Computer Science, Indiana University-Bloomington, Bloomington, IN 47405, USA
E
E. Inack
Perimeter Institute for Theoretical Physics, Waterloo, Ontario, Canada; Department of Physics and Astronomy, University of Waterloo, Waterloo, Ontario, Canada
Mohamed Hibat-Allah
Mohamed Hibat-Allah
Assistant Professor, University of Waterloo
Quantum PhysicsStatistical PhysicsMachine Learning