CaliPPer: quantifying, predicting and improving AI model performance for binding prediction

📅 2026-06-05

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

Current AI models exhibit unreliable predictive performance when applied to new datasets, significantly hindering the discovery of therapeutic antibodies and T cell receptors (TCRs). To address this challenge, this work proposes CaliPPer, a novel framework that achieves, for the first time, fine-grained, label-agnostic performance estimation. By integrating sample-to-domain distance (S2DD), distance-aware Bayesian calibration, density ratio estimation, and multi-resolution modeling, CaliPPer systematically enhances reliability across three dimensions: generalizability scoring, overall performance prediction, and per-sample confidence estimation. The method attains distance–performance correlations of 0.80–0.92 across diverse immune receptor tasks, with AUROC/AP/F1 prediction errors as low as 0.008–0.070. Notably, it improves AUROC by up to 0.20 on unseen epitopes and substantially increases true discovery rates in five retrospective studies.

📝 Abstract

Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.

Problem

Research questions and friction points this paper is trying to address.

binding prediction

model performance estimation

generalization

neoepitopes

antigen variants

Innovation

Methods, ideas, or system contributions that make the work stand out.

CaliPPer

Sample-to-Domain Distance

Bayesian recalibration