Representing local protein environments with atomistic foundation models

📅 2025-05-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of efficiently representing highly variable local protein environments—structurally and chemically heterogeneous regions where existing methods struggle to jointly encode geometric, sequential, and protonation-state information. To this end, we propose the first local protein environment encoding framework leveraging emergent representations from intermediate layers of an atomic foundation model (AFM), integrating secondary structure, residue identity, and protonation state within a physically constrained architecture for nuclear magnetic resonance (NMR) chemical shift prediction. Our method synergistically combines geometric deep learning with structural embedding space modeling, enabling the first data-driven learning of biophysical environmental priors. On the NMR chemical shift prediction task, it achieves state-of-the-art accuracy. The learned representation space exhibits strong structural semantic coherence, significantly outperforming conventional molecular simulation–based and sequence-based approaches.

Technology Category

Application Category

📝 Abstract
The local structure of a protein strongly impacts its function and interactions with other molecules. Therefore, a concise, informative representation of a local protein environment is essential for modeling and designing proteins and biomolecular interactions. However, these environments' extensive structural and chemical variability makes them challenging to model, and such representations remain under-explored. In this work, we propose a novel representation for a local protein environment derived from the intermediate features of atomistic foundation models (AFMs). We demonstrate that this embedding effectively captures both local structure (e.g., secondary motifs), and chemical features (e.g., amino-acid identity and protonation state). We further show that the AFM-derived representation space exhibits meaningful structure, enabling the construction of data-driven priors over the distribution of biomolecular environments. Finally, in the context of biomolecular NMR spectroscopy, we demonstrate that the proposed representations enable a first-of-its-kind physics-informed chemical shift predictor that achieves state-of-the-art accuracy. Our results demonstrate the surprising effectiveness of atomistic foundation models and their emergent representations for protein modeling beyond traditional molecular simulations. We believe this will open new lines of work in constructing effective functional representations for protein environments.
Problem

Research questions and friction points this paper is trying to address.

Modeling diverse local protein structures and chemical features
Developing concise representations for protein function and interactions
Creating accurate physics-informed chemical shift predictors
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses atomistic foundation models for protein representation
Captures local structure and chemical features effectively
Enables physics-informed chemical shift predictor
🔎 Similar Papers
No similar papers found.
M
Meital Bojan
IST Austria
S
S. Vedula
IST Austria, Technion, Israel
A
Advaith Maddipatla
IST Austria, Technion, Israel, University of Oxford, UK
N
Nadav Bojan Sellam
IST Austria
F
Federico Napoli
IST Austria
Paul Schanda
Paul Schanda
Institute of Science and Technology Austria
NMR spectroscopyprotein structureProtein Dynamicssolid-state NMRbiophysics
A
Alexander M Bronstein
IST Austria, Technion, Israel