🤖 AI Summary
Terpenoid knowledge is fragmented across disparate sources, hindering cross-disciplinary integration and application.
Method: We constructed the first domain-specific knowledge base and retrieval-augmented generation (RAG) platform for terpenoid research. By integrating over 20 years of heterogeneous literature, we applied terpenoid-specific nomenclature standardization, structured information extraction, and expert validation to build a high-quality knowledge graph. We further designed a terpenoid-semantic-aware RAG system incorporating domain-adapted retrieval and generation modules.
Contribution/Results: The platform enables precise, multi-dimensional querying—including chemical structures, biological activities, and molecular targets—and delivers traceable, high-confidence answers. Evaluation shows our RAG system significantly outperforms general-purpose large language models, achieving a 32.7% absolute accuracy gain. The fully open-source system is publicly deployed, offering real-time web access and RESTful API integration for the global research community.
📝 Abstract
Terpenoids are a crucial class of natural products that have been studied for over 150 years, but their interdisciplinary nature (spanning chemistry, pharmacology, and biology) complicates knowledge integration. To address this, the authors developed TeroSeek, a curated knowledge base (KB) built from two decades of terpenoid literature, coupled with an AI-powered question-answering chatbot and web service. Leveraging a retrieval-augmented generation (RAG) framework, TeroSeek provides structured, high-quality information and outperforms general-purpose large language models (LLMs) in terpenoid-related queries. It serves as a domain-specific expert tool for multidisciplinary research and is publicly available at http://teroseek.qmclab.com.