LLMs4SchemaDiscovery: A Human-in-the-Loop Workflow for Scientific Schema Mining with Large Language Models

📅 2025-04-01

📈 Citations: 0

✨ Influential: 0

career value

207K/year

🤖 AI Summary

Traditional pattern mining for scientific text structuring relies heavily on semi-structured data and suffers from poor scalability. Method: This paper proposes a human-in-the-loop iterative pattern discovery framework. Its core innovation is the first expert-feedback-driven large language model (LLM)-based schema refinement mechanism, integrating domain ontology alignment and manual verification to automatically generate semantically rich, interpretable schemas. The framework combines LLM-powered attribute extraction, semantic clustering, and an interactive feedback interface. Contribution/Results: Evaluated in the atomic layer deposition (ALD) materials science domain, the method significantly improves schema coverage, precision, and domain adaptability. It produces highly reusable, structured schemas, overcoming key limitations of purely data-driven and purely rule-based approaches.

Technology Category

Application Category

📝 Abstract

Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science--specifically atomic layer deposition--schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.

Problem

Research questions and friction points this paper is trying to address.

Automates schema extraction from unstructured text using LLMs

Incorporates human feedback to refine semantic schema mining

Applies domain-specific ontologies for materials science applications

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines LLMs with human feedback for schema extraction

Iterative workflow integrates expert input and ontologies

Generates semantically rich schemas for domain applications

🔎 Similar Papers

Toward Reliable Ad-hoc Scientific Information Extraction: A Case Study on Two Materials Datasets