🤖 AI Summary
Traditional pattern mining for scientific text structuring relies heavily on semi-structured data and suffers from poor scalability. Method: This paper proposes a human-in-the-loop iterative pattern discovery framework. Its core innovation is the first expert-feedback-driven large language model (LLM)-based schema refinement mechanism, integrating domain ontology alignment and manual verification to automatically generate semantically rich, interpretable schemas. The framework combines LLM-powered attribute extraction, semantic clustering, and an interactive feedback interface. Contribution/Results: Evaluated in the atomic layer deposition (ALD) materials science domain, the method significantly improves schema coverage, precision, and domain adaptability. It produces highly reusable, structured schemas, overcoming key limitations of purely data-driven and purely rule-based approaches.
📝 Abstract
Extracting structured information from unstructured text is crucial for modeling real-world processes, but traditional schema mining relies on semi-structured data, limiting scalability. This paper introduces schema-miner, a novel tool that combines large language models with human feedback to automate and refine schema extraction. Through an iterative workflow, it organizes properties from text, incorporates expert input, and integrates domain-specific ontologies for semantic depth. Applied to materials science--specifically atomic layer deposition--schema-miner demonstrates that expert-guided LLMs generate semantically rich schemas suitable for diverse real-world applications.