π€ AI Summary
Low accuracy in extracting structured multi-attribute tuples (e.g., mechanical properties) from multicomponent alloy literature arises from strong tuple interdependence and contextual ambiguity. To address this, we propose the first end-to-end joint extraction framework comprising three stages: MatSciBERT-based entity recognition, pointer-network-driven boundary detection, and cross- and intra-entity enhanced attention mechanisms. Unlike conventional sequence-labeling approaches, our method explicitly models complex tuple relationships through joint training. Evaluated on standardized 1β4-argument tuple datasets, it achieves F1 scores of 0.963β0.753; on a newly constructed random dataset, it attains an F1 of 0.854βsubstantially outperforming existing methods. This work provides a high-precision, scalable technical foundation for materials science text structuring and data-driven materials design.
π Abstract
Extracting high-quality structured information from scientific literature is crucial for advancing material design through data-driven methods. Despite the considerable research in natural language processing for dataset extraction, effective approaches for multi-tuple extraction in scientific literature remain scarce due to the complex interrelations of tuples and contextual ambiguities. In the study, we illustrate the multi-tuple extraction of mechanical properties from multi-principal-element alloys and presents a novel framework that combines an entity extraction model based on MatSciBERT with pointer networks and an allocation model utilizing inter- and intra-entity attention. Our rigorous experiments on tuple extraction demonstrate impressive F1 scores of 0.963, 0.947, 0.848, and 0.753 across datasets with 1, 2, 3, and 4 tuples, confirming the effectiveness of the model. Furthermore, an F1 score of 0.854 was achieved on a randomly curated dataset. These results highlight the model's capacity to deliver precise and structured information, offering a robust alternative to large language models and equipping researchers with essential data for fostering data-driven innovations.