Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics

📅 2026-01-16

📈 Citations: 0

✨ Influential: 0

career value

236K/year

🤖 AI Summary

Traditional sequence optimization methods struggle to model high-dimensional epistatic effects in proteins and often neglect structural constraints, leading to inefficiency. This work proposes HADES, a novel approach that uniquely integrates mutual constraints between structure and sequence into the optimization process. HADES employs a two-stage encoder–decoder architecture to model structure–function relationships within mutant neighborhoods and combines Hamiltonian dynamics with Bayesian optimization to efficiently sample from a structure-aware approximate posterior distribution. To generate feasible discrete sequences, it further introduces positional discretization. By leveraging physical momentum and uncertainty-guided exploration, HADES constructs a smooth fitness–structure landscape, outperforming state-of-the-art methods across multiple in-silico benchmarks and successfully designing protein sequences that maintain native-like structures while enhancing functional properties.

Technology Category

Application Category

📝 Abstract

The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional complexities due to the epistasis effect and the disregard for structural constraints. To address this, we propose HADES, a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior. Leveraging momentum and uncertainty in the simulated physical movements, HADES enables rapid transition of proposals toward promising areas. A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system. The posterior surrogate is powered by a two-stage encoder-decoder framework to determine the structure and function relationships between mutant neighbors, consequently learning a smoothed landscape to sample from. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations across most metrics. Remarkably, our approach offers a unique advantage by leveraging the mutual constraints between protein structure and sequence, facilitating the design of protein sequences with similar structures and optimized properties. The code and data are publicly available at https://github.com/GENTEL-lab/HADES.

Problem

Research questions and friction points this paper is trying to address.

protein optimization

epistasis

structural constraints

high-dimensional complexity

sequence-structure relationship

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hamiltonian dynamics

structure-aware optimization

Bayesian optimization