CaliPPer: quantifying, predicting and improving AI model performance for binding prediction

📅 2026-06-05
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Current AI models exhibit unreliable predictive performance when applied to new datasets, significantly hindering the discovery of therapeutic antibodies and T cell receptors (TCRs). To address this challenge, this work proposes CaliPPer, a novel framework that achieves, for the first time, fine-grained, label-agnostic performance estimation. By integrating sample-to-domain distance (S2DD), distance-aware Bayesian calibration, density ratio estimation, and multi-resolution modeling, CaliPPer systematically enhances reliability across three dimensions: generalizability scoring, overall performance prediction, and per-sample confidence estimation. The method attains distance–performance correlations of 0.80–0.92 across diverse immune receptor tasks, with AUROC/AP/F1 prediction errors as low as 0.008–0.070. Notably, it improves AUROC by up to 0.20 on unseen epitopes and substantially increases true discovery rates in five retrospective studies.
📝 Abstract
Binding prediction models accelerate therapeutic antibody and TCR discovery, but their performance on new datasets is unpredictable, often leading to low discovery rates. Density-ratio methods (PAPE, M-CBPE) provide label-free performance estimation for binary classification, but their assumptions and aggregate-only outputs limit binding prediction on neoepitopes, antigen variants and chemical scaffolds. Here we present CaliPPer (Calibration and Prediction of Performance), a post-hoc framework pairing a multi-chain Sample-to-Domain Distance (S2DD) with distance-aware Bayesian recalibration, operating at three resolutions: generalisability score, aggregate performance prediction, and per-sample confidence. Across ten models, eight architectures and two immune-receptor domains, CaliPPer attains distance--performance correlations $|r|=0.80\text{--}0.92$, predicts AUROC/AP/F1 with mean absolute errors $0.008\text{--}0.070$, and improves AUROC by up to $+0.20$ on unseen epitopes/variants. Applied retrospectively to five published TCR, BCR, MHC--peptide and small-molecule studies, CaliPPer raises true discovery rates in all five (e.g.\ $0/5 \to 3/5$ confirmed neoantigens), providing a triage layer between computational prediction and experimental validation.
Problem

Research questions and friction points this paper is trying to address.

binding prediction
model performance estimation
generalization
neoepitopes
antigen variants
Innovation

Methods, ideas, or system contributions that make the work stand out.

CaliPPer
Sample-to-Domain Distance
Bayesian recalibration
binding prediction
performance estimation
💼 Related Jobs
Postdoctoral Fellow – AI-Driven Multi-Omics Integration for Predictive Toxicology
Pfizer
The annual base salary for this position ranges from $64,600.00 to $107,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 7.5% of the base salary. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
Hybrid
Jian-Qing Zheng
Jian-Qing Zheng
University of Oxford
Biomedical Data AnalysisMedical Image ComputingImage-Guided InterventionsAI for Biomedicine
Hantao Lou
Hantao Lou
Peking University
AI AlignmentAI SafetyInterpretabilityTrustworthy AI
Z
Zinan Yin
Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK
S
Sam Farrar
Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK
Y
Yuze Zhou
Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK; Center for Translational Immunology, Nuffield Department of Medicine, University of Oxford, UK
E
Elie Antoun
Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK; Center for Translational Immunology, Nuffield Department of Medicine, University of Oxford, UK
X
Xiangxi Wang
Key Laboratory of Infection and Immunity, National Laboratory of Macromolecules, Institute of Biophysics, Chinese Academy of Sciences, Beijing, CN
X
Xuetao Cao
Chinese Academy of Medical Science (CAMS) Oxford Institute (COI), University of Oxford, Oxford, UK; State Key Laboratory of Medicinal Chemical Biology, Institute of Immunology, College of Life Sciences, Nankai University, Tianjin, CN; Department of Immunology, Center for Immunotherapy, Peking Union Medical College, Chinese Academy of Medical Sciences, Beijing, CN
Tao Dong
Tao Dong
Google
Human-Computer InteractionUser Experience ResearchDeveloper ExperienceProgramming Tools