Exploring a Large Language Model for Transforming Taxonomic Data into OWL: Lessons Learned and Implications for Ontology Development

📅 2025-03-01
🏛️ Data Intelligence
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Manual maintenance of taxonomic species names in agricultural ontologies is costly and struggles to keep pace with dynamic taxonomic revisions. Method: This paper pioneers a systematic investigation into the feasibility of directly generating OWL ontologies using large language models (LLMs), proposing a dual-path prompting strategy: (1) an interactive browser extension for ontology construction, and (2) an LLM-driven Python script for batch generation. Leveraging ChatGPT-4 and the GBIF Backbone API, we automatically construct the :Organism module of the Agricultural Product Type Ontology (APTO), producing standards-compliant OWL 2 files. Contribution/Results: The second path enables scalable processing of over one thousand species per batch, achieving >85% accuracy. This work demonstrates the practical utility of LLMs for managing evolving taxonomic data and establishes a novel, scalable, low-barrier automation paradigm for ontology engineering.

Technology Category

Application Category

📝 Abstract
Managing scientific names in ontologies that represent species taxonomies is challenging due to the ever-evolving nature of these taxonomies. Manually maintaining these names becomes increasingly difficult when dealing with thousands of scientific names. To address this issue, this paper investigates the use of ChatGPT-4 to automate the development of the :Organism module in the Agricultural Product Types Ontology (APTO) for species classification. Our methodology involved leveraging ChatGPT-4 to extract data from the GBIF Backbone API and generate OWL files for further integration in APTO. Two alternative approaches were explored: (1) issuing a series of prompts for ChatGPT-4 to execute tasks via the BrowserOP plugin and (2) directing ChatGPT-4 to design a Python algorithm to perform analogous tasks. Both approaches rely on a prompting method where we provide instructions, context, input data, and an output indicator. The first approach showed scalability limitations, while the second approach used the Python algorithm to overcome these challenges, but it struggled with typographical errors in data handling. This study highlights the potential of Large language models like ChatGPT-4 to streamline the management of species names in ontologies. Despite certain limitations, these tools offer promising advancements in automating taxonomy-related tasks and improving the efficiency of ontology development.
Problem

Research questions and friction points this paper is trying to address.

Automating species name management in ontologies using ChatGPT-4
Transforming taxonomic data into OWL for ontology development
Addressing scalability and data errors in automated taxonomy tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Using ChatGPT-4 to automate OWL file generation
Leveraging Python algorithms for scalable data processing
Integrating GBIF Backbone API for taxonomic data extraction
🔎 Similar Papers
No similar papers found.
Filipi Miranda Soares
Filipi Miranda Soares
National Research Institute for Agriculture, Food and Environment (INRAE), France
Semantic WebBiodiversity informaticsDigital AgricultureCitizen ScienceUser Experience
A
Antonio Mauro Saraiva
University of Sao Paulo, Polytechnic School, Computer Engineering and Digital Systems, Sao Paulo, 05508-010, Brazil; University of Sao Paulo, Center for Artificial Intelligence (C4AI), Sao Paulo, 05508-020, Brazil
L
Luís Ferreira Pires
University of Twente, Faculty of Electrical Engineering, Mathematics and Computer Science, Semantics, Cybersecurity & Services Group, Enschede, 7522 NB, Netherlands
Luiz Olavo Bonino da Silva Santos
Luiz Olavo Bonino da Silva Santos
Semantics, Cybersecurity & Services (SCS), EEMCS, University of Twente, the Netherlands
FAIROntologySOAapplied ontology
Dilvan de Abreu Moreira
Dilvan de Abreu Moreira
University of São Paulo
semantic webbiomedical informaticsjava
F
Fernando Elias Corrêa
University of Sao Paulo, Luiz de Queiroz College of Agriculture, Center for Advanced Studies on Applied Economics, Piracicaba, 13400-970, Brazil; University of Sao Paulo, Center for Artificial Intelligence (C4AI), Sao Paulo, 05508-020, Brazil
K
K. Braghetto
University of Sao Paulo, Institute of Mathematics and Statistics, Sao Paulo, 05508-090, Brazil; University of Sao Paulo, Center for Artificial Intelligence (C4AI), Sao Paulo, 05508-020, Brazil
D
Debora P. Drucker
Embrapa Digital Agriculture, Campinas, 13083-886, Brazil; University of Sao Paulo, Center for Artificial Intelligence (C4AI), Sao Paulo, 05508-020, Brazil
A
Alexandre Cláudio Botazzo Delbem
University of Sao Paulo, Center for Artificial Intelligence (C4AI), Sao Paulo, 05508-020, Brazil