CONFIDE: Hallucination Assessment for Reliable Biomolecular Structure Prediction and Design

📅 2025-11-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing protein structure reliability metrics (e.g., pLDDT) emphasize energy-based stability but fail to detect subtle errors—such as atomic clashes and conformational traps—arising from topological frustration in the energy landscape. To address this, we propose CONFIDE, the first framework to quantify topological frustration in an unsupervised manner by leveraging latent embeddings from the AlphaFold3 diffusion model, yielding the topology-aware metric CODE. CONFIDE then integrates CODE with pLDDT into a unified, dual-dimensional (energy + topology) reliability score. Experiments demonstrate that CODE achieves a Spearman correlation of 0.82 with experimental protein folding rates—a 148% relative improvement over prior metrics. CONFIDE attains a Spearman correlation of 0.73 with RMSD in molecular glue prediction, representing a 73.8% gain over state-of-the-art methods. Moreover, CONFIDE consistently outperforms existing approaches across diverse drug design tasks, including binder design and interface prediction.

Technology Category

Application Category

📝 Abstract
Reliable evaluation of protein structure predictions remains challenging, as metrics like pLDDT capture energetic stability but often miss subtle errors such as atomic clashes or conformational traps reflecting topological frustration within the protein folding energy landscape. We present CODE (Chain of Diffusion Embeddings), a self evaluating metric empirically found to quantify topological frustration directly from the latent diffusion embeddings of the AlphaFold3 series of structure predictors in a fully unsupervised manner. Integrating this with pLDDT, we propose CONFIDE, a unified evaluation framework that combines energetic and topological perspectives to improve the reliability of AlphaFold3 and related models. CODE strongly correlates with protein folding rates driven by topological frustration, achieving a correlation of 0.82 compared to pLDDT's 0.33 (a relative improvement of 148%). CONFIDE significantly enhances the reliability of quality evaluation in molecular glue structure prediction benchmarks, achieving a Spearman correlation of 0.73 with RMSD, compared to pLDDT's correlation of 0.42, a relative improvement of 73.8%. Beyond quality assessment, our approach applies to diverse drug design tasks, including all-atom binder design, enzymatic active site mapping, mutation induced binding affinity prediction, nucleic acid aptamer screening, and flexible protein modeling. By combining data driven embeddings with theoretical insight, CODE and CONFIDE outperform existing metrics across a wide range of biomolecular systems, offering robust and versatile tools to refine structure predictions, advance structural biology, and accelerate drug discovery.
Problem

Research questions and friction points this paper is trying to address.

Improves reliability of protein structure prediction evaluation
Combines energetic and topological perspectives for assessment
Enhances quality evaluation in biomolecular design tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-evaluating metric using diffusion embeddings for topological frustration.
Unified framework combining energetic and topological perspectives for reliability.
Versatile application across diverse biomolecular systems and drug design.
Z
Zijun Gao
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.
M
Mutian He
Faculty of Applied Sciences, Macao Polytechnic University, Macao.
S
Shijia Sun
Faculty of Applied Sciences, Macao Polytechnic University, Macao.
Hanqun Cao
Hanqun Cao
The Chinese University of Hong Kong
Generative ModelingAI4Science
J
Jingjie Zhang
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.
Zihao Luo
Zihao Luo
University of Electronic Science and Technology of China | Shanghai Innovation Institute
Medical Image AnalysisFoundation ModelAI for Science
Xiaorui Wang
Xiaorui Wang
Professor of Computer Engineering, The Ohio State University
Power ManagementData CentersReal-Time Embedded SystemsComputer ArchitectureComputer Systems
X
Xiaojun Yao
Faculty of Applied Sciences, Macao Polytechnic University, Macao.
Chang-Yu Hsieh
Chang-Yu Hsieh
Zhejiang University
Open Quantum SystemsQuantum SimulationsAI for Science
C
Chunbin Gu
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.
P
P. Heng
Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong.