Efficient Protein Optimization via Structure-aware Hamiltonian Dynamics

📅 2026-01-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional sequence optimization methods struggle to model high-dimensional epistatic effects in proteins and often neglect structural constraints, leading to inefficiency. This work proposes HADES, a novel approach that uniquely integrates mutual constraints between structure and sequence into the optimization process. HADES employs a two-stage encoder–decoder architecture to model structure–function relationships within mutant neighborhoods and combines Hamiltonian dynamics with Bayesian optimization to efficiently sample from a structure-aware approximate posterior distribution. To generate feasible discrete sequences, it further introduces positional discretization. By leveraging physical momentum and uncertainty-guided exploration, HADES constructs a smooth fitness–structure landscape, outperforming state-of-the-art methods across multiple in-silico benchmarks and successfully designing protein sequences that maintain native-like structures while enhancing functional properties.

Technology Category

Application Category

📝 Abstract
The ability to engineer optimized protein variants has transformative potential for biotechnology and medicine. Prior sequence-based optimization methods struggle with the high-dimensional complexities due to the epistasis effect and the disregard for structural constraints. To address this, we propose HADES, a Bayesian optimization method utilizing Hamiltonian dynamics to efficiently sample from a structure-aware approximated posterior. Leveraging momentum and uncertainty in the simulated physical movements, HADES enables rapid transition of proposals toward promising areas. A position discretization procedure is introduced to propose discrete protein sequences from such a continuous state system. The posterior surrogate is powered by a two-stage encoder-decoder framework to determine the structure and function relationships between mutant neighbors, consequently learning a smoothed landscape to sample from. Extensive experiments demonstrate that our method outperforms state-of-the-art baselines in in-silico evaluations across most metrics. Remarkably, our approach offers a unique advantage by leveraging the mutual constraints between protein structure and sequence, facilitating the design of protein sequences with similar structures and optimized properties. The code and data are publicly available at https://github.com/GENTEL-lab/HADES.
Problem

Research questions and friction points this paper is trying to address.

protein optimization
epistasis
structural constraints
high-dimensional complexity
sequence-structure relationship
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hamiltonian dynamics
structure-aware optimization
Bayesian optimization
protein engineering
discrete sequence sampling