🤖 AI Summary
This study addresses the pervasive issue of data leakage in existing protein–ligand binding affinity prediction methods, where similarity-based splits artificially inflate model generalization performance. To mitigate this, the authors introduce a leakage-aware evaluation protocol and a pairwise ablation strategy, enabling a systematic assessment on the leakage-free LP-PDBBind dataset. They specifically investigate the contributions of protein sequence priors—via frozen ESM-2 embeddings—and learnable pocket positional tokens. Under strict leakage-free evaluation, a lightweight 1D model using only pocket tokens achieves superior performance, whereas models incorporating ESM-2 embeddings excel under conventional (leakage-prone) splits. These findings underscore the profound impact of evaluation protocols on model assessment and challenge the prevailing default practices in the field.
📝 Abstract
Sequence-based deep learning offers a scalable alternative to structure-based scoring for protein-ligand binding affinity prediction. However, progress is hard to interpret when architectural priors are evaluated on canonical PDBbind-style splits that leak similarity classes across folds. We present HonestAffinity, a compact 1D-input predictor to isolate two priors under a leak-aware protocol: frozen ESM-2 (650M) protein embeddings and a learned binary pocket-position marker. We evaluate a multi-scale convolutional/Transformer template in three variants: HonestAffinity-Pocket, HonestAffinity-NoPocket, and HonestAffinity-Pocket-NoESM. All three train on 11,513 LP-PDBBind complexes in ~3 GPU-hours. We benchmark against five baselines on the LP-PDBBind 3-tier no-leak hold-out, CASF-2016, and a CASF-2016 non-train subset. Our central finding is a split-conditioned reversal rather than a uniformly best prior: HonestAffinity-Pocket achieves the best mean Pearson R on validation and CASF-2016 splits, whereas HonestAffinity-Pocket-NoESM achieves the best mean Pearson R on every strict LP no-leak tier (test_cl1-cl3). Both the pocket marker and ESM-2 input improve performance on familiar splits but reduce Pearson R on strict no-leak tiers. We argue models should report paired canonical and leak-proof ablations, and that deployment-regime-matched variants better describe these reversals than a single default. Code and scripts are linked in the footnote; checkpoints will be released upon acceptance.