LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

📅 2025-04-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Traditional pattern mining for scientific text structuring relies heavily on semi-structured data and suffers from poor scalability. Method: This paper proposes a human-in-the-loop iterative pattern discovery framework. Its core innovation is the first expert-feedback-driven large language model (LLM)-based schema refinement mechanism, integrating domain ontology alignment and manual verification to automatically generate semantically rich, interpretable schemas. The framework combines LLM-powered attribute extraction, semantic clustering, and an interactive feedback interface. Contribution/Results: Evaluated in the atomic layer deposition (ALD) materials science domain, the method significantly improves schema coverage, precision, and domain adaptability. It produces highly reusable, structured schemas, overcoming key limitations of purely data-driven and purely rule-based approaches.

Technology Category

Application Category

📝 Abstract
Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science--specifically atomic layer deposition--schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.
Problem

Research questions and friction points this paper is trying to address.

Automates schema extraction from unstructured text using LLMs
Incorporates human feedback to refine semantic schema mining
Applies domain-specific ontologies for materials science applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with human feedback for schema extraction
Iterative workflow integrates expert input and ontologies
Generates semantically rich schemas for domain applications
🔎 Similar Papers
2024-06-08Annual Meeting of the Association for Computational LinguisticsCitations: 2
S
Sameer Sadruddin
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Jennifer D'Souza
Jennifer D'Souza
TIB Leibniz Information Centre for Science and Technology
Natural Language ProcessingScientific Knowledge ExtractionLLM EvaluationScientometrics
E
Eleni Poupaki
TU/e Eindhoven University of Technology, Netherlands
A
Alex Watkins
University of Warwick, United Kingdom
Hamed Babaei Giglou
Hamed Babaei Giglou
TIB — Leibniz Information Centre for Science and Technology
NLPLLMsReinforcement LearningOntology EngineeringSemantic Web
A
Anisa Rula
University of Brescia, Italy
B
Bora Karasulu
University of Warwick, United Kingdom
S
Soren Auer
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany; L3S Research Center, Leibniz University of Hannover, Germany
A
Adrie Mackus
TU/e Eindhoven University of Technology, Netherlands
E
Erwin Kessels
TU/e Eindhoven University of Technology, Netherlands