🤖 AI Summary
Current AI models exhibit unreliable predictive performance when applied to new datasets, significantly hindering the discovery of therapeutic antibodies and T cell receptors (TCRs). To address this challenge, this work proposes CaliPPer, a novel framework that achieves, for the first time, fine-grained, label-agnostic performance estimation. By integrating sample-to-domain distance (S2DD), distance-aware Bayesian calibration, density ratio estimation, and multi-resolution modeling, CaliPPer systematically enhances reliability across three dimensions: generalizability scoring, overall performance prediction, and per-sample confidence estimation. The method attains distance–performance correlations of 0.80–0.92 across diverse immune receptor tasks, with AUROC/AP/F1 prediction errors as low as 0.008–0.070. Notably, it improves AUROC by up to 0.20 on unseen epitopes and substantially increases true discovery rates in five retrospective studies.
📝 Abstract
Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.