TeroSeek: An AI-Powered Knowledge Base and Retrieval Generation Platform for Terpenoid Research

📅 2025-05-27
📈 Citations: 0
Influential: 0
📄 PDF

career value

159K/year
🤖 AI Summary
Terpenoid knowledge is fragmented across disparate sources, hindering cross-disciplinary integration and application. Method: We constructed the first domain-specific knowledge base and retrieval-augmented generation (RAG) platform for terpenoid research. By integrating over 20 years of heterogeneous literature, we applied terpenoid-specific nomenclature standardization, structured information extraction, and expert validation to build a high-quality knowledge graph. We further designed a terpenoid-semantic-aware RAG system incorporating domain-adapted retrieval and generation modules. Contribution/Results: The platform enables precise, multi-dimensional querying—including chemical structures, biological activities, and molecular targets—and delivers traceable, high-confidence answers. Evaluation shows our RAG system significantly outperforms general-purpose large language models, achieving a 32.7% absolute accuracy gain. The fully open-source system is publicly deployed, offering real-time web access and RESTful API integration for the global research community.

Technology Category

Application Category

📝 Abstract
Terpenoids are a crucial class of natural products that have been studied for over 150 years, but their interdisciplinary nature (spanning chemistry, pharmacology, and biology) complicates knowledge integration. To address this, the authors developed TeroSeek, a curated knowledge base (KB) built from two decades of terpenoid literature, coupled with an AI-powered question-answering chatbot and web service. Leveraging a retrieval-augmented generation (RAG) framework, TeroSeek provides structured, high-quality information and outperforms general-purpose large language models (LLMs) in terpenoid-related queries. It serves as a domain-specific expert tool for multidisciplinary research and is publicly available at http://teroseek.qmclab.com.
Problem

Research questions and friction points this paper is trying to address.

Integrating interdisciplinary terpenoid knowledge spanning chemistry, pharmacology, and biology
Providing structured, high-quality information for terpenoid research queries
Developing a domain-specific AI tool to outperform general-purpose LLMs
Innovation

Methods, ideas, or system contributions that make the work stand out.

AI-powered knowledge base for terpenoids
Retrieval-augmented generation (RAG) framework
Domain-specific expert chatbot service
🔎 Similar Papers
No similar papers found.
💼 Related Jobs
AI Data Engineer--LLMs / Agentic Systems
Pfizer
The annual base salary for this position ranges from $106,000.00 to $176,600.00. In addition, this position is eligible for participation in Pfizer’s Global Performance Plan with a bonus target of 15.0% of the base salary and eligibility to participate in our share based long term incentive program. We offer comprehensive and generous benefits and programs to help our colleagues lead healthy lives and to support each of life’s moments. Benefits offered include a 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution, paid vacation, holiday and personal days, paid caregiver/parental and medical leave, and health benefits to include medical, prescription drug, dental and vision coverage. Learn more at Pfizer Candidate Site – U.S. Benefits | (uscandidates.mypfizerbenefits.com). Pfizer compensation structures and benefit packages are aligned based on the location of hire. The United States salary range provided does not apply to Tampa, FL or any location outside of the United States. Relocation assistance may be available based on business needs and/or eligibility.
United States - Massachusetts - Cambridge
X
Xu Kang
School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
S
Siqi Jiang
School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
Kangwei Xu
Kangwei Xu
PhD Candidate, Technical University of Munich | EDA
Electronic Design AutomationHigh-Level SynthesisVLSIAI for EDA
J
Jiahao Li
School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China
R
Ruibo Wu
School of Pharmaceutical Sciences, Sun Yat-sen University, Guangzhou 510006, P.R. China