Enhancing Large Language Models with Domain-Specific Knowledge: The Case in Topological Materials

📅 2024-09-10
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limitations of large language models (LLMs) in domain-specific expertise on topological materials and the prohibitive computational cost of full-parameter fine-tuning, this work introduces MaterialsKG—the first domain-specific knowledge graph for topological quantum matter—constructed by semantically integrating extensive scientific literature. We propose a knowledge graph-enhanced, context-aware prompt learning framework that enables lightweight, scalable, and parameter-efficient adaptation of LLMs without full-parameter fine-tuning. Our approach innovatively couples structured domain knowledge with advanced prompt engineering. Experimental results demonstrate substantial improvements across key tasks: question-answering accuracy increases by 42% over general-purpose LLMs; F1-score for complex relational reasoning reaches 0.81; and the framework supports real-time interactive material discovery with verifiable knowledge provenance.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs), such as ChatGPT, have demonstrated impressive performance in the text generation task, showing the ability to understand and respond to complex instructions. However, the performance of naive LLMs in speciffc domains is limited due to the scarcity of domain-speciffc corpora and specialized training. Moreover, training a specialized large-scale model necessitates signiffcant hardware resources, which restricts researchers from leveraging such models to drive advances. Hence, it is crucial to further improve and optimize LLMs to meet speciffc domain demands and enhance their scalability. Based on the condensed matter data center, we establish a material knowledge graph (MaterialsKG) and integrate it with literature. Using large language models and prompt learning, we develop a specialized dialogue system for topological materials called TopoChat. Compared to naive LLMs, TopoChat exhibits superior performance in structural and property querying, material recommendation, and complex relational reasoning. This system enables efffcient and precise retrieval of information and facilitates knowledge interaction, thereby encouraging the advancement on the ffeld of condensed matter materials.
Problem

Research questions and friction points this paper is trying to address.

Large Language Models
Domain-specific Knowledge
Resource Efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

MaterialsKG Integration
Domain-specific Performance Enhancement
TopoChat Dialog System
H
Huangchao Xu
Computer Network Information Center, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing, China
B
Baohua Zhang
Computer Network Information Center, Chinese Academy of Sciences, Beijing, China
Zhong Jin
Zhong Jin
Professor, School of Chemistry and Chemical Engineering, Nanjing University
Nanomaterials - Carbon Nanotubes - 2D Materials - Graphene - Energy Storage - Nanoelectronics - Nanolithography
T
Tiannian Zhu
Institute of Physics, Chinese Academy of Sciences, Beijing, China
Q
Quansheng Wu
Institute of Physics, Chinese Academy of Sciences, Beijing, China
Hongming Weng
Hongming Weng
Institute of Physics, Chinese Academy of Sciences
Condensed Matter Physics