RAGPPI: RAG Benchmark for Protein-Protein Interactions in Drug Discovery

📅 2025-05-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
A dedicated benchmark for evaluating the biological impact of protein–protein interactions (PPIs) in drug discovery remains absent. Method: We introduce RAGPPI—the first RAG-specific evaluation benchmark for PPIs—comprising 4,420 high-quality, factually grounded question-answer (QA) pairs. Our methodology integrates expert-driven curation with large language model (LLM)-assisted generation, establishing a hybrid gold/silver standard data construction paradigm. We define a multi-dimensional QA quality assessment framework and incorporate domain expert interviews, manual annotation of 500 QA pairs, ensemble-based LLM self-evaluation, and RAG pipeline customization. Contribution/Results: RAGPPI significantly improves accuracy and interpretability of PPI-related QA tasks. It has been adopted by multiple pharmaceutical AI teams and serves as a robust evaluation infrastructure for knowledge retrieval and reasoning in target identification.

Technology Category

Application Category

📝 Abstract
Retrieving the biological impacts of protein-protein interactions (PPIs) is essential for target identification (Target ID) in drug development. Given the vast number of proteins involved, this process remains time-consuming and challenging. Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG) frameworks have supported Target ID; however, no benchmark currently exists for identifying the biological impacts of PPIs. To bridge this gap, we introduce the RAG Benchmark for PPIs (RAGPPI), a factual question-answer benchmark of 4,420 question-answer pairs that focus on the potential biological impacts of PPIs. Through interviews with experts, we identified criteria for a benchmark dataset, such as a type of QA and source. We built a gold-standard dataset (500 QA pairs) through expert-driven data annotation. We developed an ensemble auto-evaluation LLM that reflected expert labeling characteristics, which facilitates the construction of a silver-standard dataset (3,720 QA pairs). We are committed to maintaining RAGPPI as a resource to support the research community in advancing RAG systems for drug discovery QA solutions.
Problem

Research questions and friction points this paper is trying to address.

Lack of benchmark for PPI biological impact identification
Time-consuming PPI analysis in drug target identification
Need for expert-validated QA dataset in drug discovery
Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces RAGPPI benchmark for PPI impacts
Uses expert-driven gold-standard dataset annotation
Develops ensemble auto-evaluation LLM for QA
🔎 Similar Papers
No similar papers found.
Y
Youngseung Jeon
University of California, Los Angeles
Z
Ziwen Li
University of California, Los Angeles
T
Thomas Li
Palo Alto High School
J
JiaSyuan Chang
University of California, Los Angeles
M
Morteza Ziyadi
Amazon AGI
Xiang 'Anthony' Chen
Xiang 'Anthony' Chen
Associate Professor, UCLA
Human-Computer Interaction