Large Language Models for Scholarly Ontology Generation: An Extensive Analysis in the Engineering Field

📅 2024-12-11

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 0

career value

207K/year

🤖 AI Summary

To address high annotation costs, delayed updates, and poor generalizability in academic ontology construction, this paper systematically investigates the capability of large language models (LLMs) to automatically extract semantic relations—such as hypernymy/hyponymy and equivalence—among engineering research topics. Leveraging the IEEE Thesaurus, we construct a high-quality gold-standard dataset and conduct the first zero-shot prompting-based comparative evaluation across 17 LLMs, systematically varying model scale, openness (open vs. closed), and quantization level. Results show that lightweight quantized models—e.g., Dolphin-Mistral-7B—achieve an F1 score of 0.920 after prompt optimization, approaching the performance of the state-of-the-art closed-source model Claude 3 Sonnet (0.967). This demonstrates a significant breakthrough in balancing computational efficiency and accuracy, enabling resource-efficient, dynamic, and scalable academic knowledge structuring.

Technology Category

Application Category

📝 Abstract

Ontologies of research topics are crucial for structuring scientific knowledge, enabling scientists to navigate vast amounts of research, and forming the backbone of intelligent systems such as search engines and recommendation systems. However, manual creation of these ontologies is expensive, slow, and often results in outdated and overly general representations. As a solution, researchers have been investigating ways to automate or semi-automate the process of generating these ontologies. This paper offers a comprehensive analysis of the ability of large language models (LLMs) to identify semantic relationships between different research topics, which is a critical step in the development of such ontologies. To this end, we developed a gold standard based on the IEEE Thesaurus to evaluate the task of identifying four types of relationships between pairs of topics: broader, narrower, same-as, and other. Our study evaluates the performance of seventeen LLMs, which differ in scale, accessibility (open vs. proprietary), and model type (full vs. quantised), while also assessing four zero-shot reasoning strategies. Several models have achieved outstanding results, including Mixtral-8x7B, Dolphin-Mistral-7B, and Claude 3 Sonnet, with F1-scores of 0.847, 0.920, and 0.967, respectively. Furthermore, our findings demonstrate that smaller, quantised models, when optimised through prompt engineering, can deliver performance comparable to much larger proprietary models, while requiring significantly fewer computational resources.

Problem

Research questions and friction points this paper is trying to address.

Automating scholarly ontology generation to replace manual creation

Evaluating LLMs in identifying semantic relationships between research topics

Optimizing smaller models to match large proprietary models' performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs identify semantic relationships for ontologies

Gold standard from IEEE Thesaurus evaluates relationships

Optimized small models match large proprietary performance

🔎 Similar Papers

No similar papers found.