Flexible Kernels for Protein Property Prediction

📅 2026-06-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Efficiently predicting protein properties—such as binding affinity and thermostability—from sparse experimental data remains challenging. This work proposes a novel sequence kernel for Gaussian process models that integrates evolutionary substitution matrices with a local linearity assumption, and innovatively incorporates structure-aware substitution matrices to embed structural priors from foundation models directly into the kernel design. By synergistically leveraging both evolutionary and structural information, the method enables effective multi-task learning and significantly outperforms existing approaches based on large-model embeddings or local supervised learning across multiple protein property prediction tasks. The approach demonstrates superior data efficiency and generalization capability, particularly in low-data regimes.
📝 Abstract
Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.
Problem

Research questions and friction points this paper is trying to address.

protein property prediction
sparse experimental data
binding affinity
thermostability
multi-task learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

flexible kernels
sequence kernels
Gaussian processes
structure-aware substitution matrices
multi-task learning