Enhanced Multi-Tuple Extraction for Alloys: Integrating Pointer Networks and Augmented Attention

πŸ“… 2025-03-10
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Low accuracy in extracting structured multi-attribute tuples (e.g., mechanical properties) from multicomponent alloy literature arises from strong tuple interdependence and contextual ambiguity. To address this, we propose the first end-to-end joint extraction framework comprising three stages: MatSciBERT-based entity recognition, pointer-network-driven boundary detection, and cross- and intra-entity enhanced attention mechanisms. Unlike conventional sequence-labeling approaches, our method explicitly models complex tuple relationships through joint training. Evaluated on standardized 1–4-argument tuple datasets, it achieves F1 scores of 0.963–0.753; on a newly constructed random dataset, it attains an F1 of 0.854β€”substantially outperforming existing methods. This work provides a high-precision, scalable technical foundation for materials science text structuring and data-driven materials design.

Technology Category

Application Category

πŸ“ Abstract
Extracting high-quality structured information from scientific literature is crucial for advancing material design through data-driven methods. Despite the considerable research in natural language processing for dataset extraction, effective approaches for multi-tuple extraction in scientific literature remain scarce due to the complex interrelations of tuples and contextual ambiguities. In the study, we illustrate the multi-tuple extraction of mechanical properties from multi-principal-element alloys and presents a novel framework that combines an entity extraction model based on MatSciBERT with pointer networks and an allocation model utilizing inter- and intra-entity attention. Our rigorous experiments on tuple extraction demonstrate impressive F1 scores of 0.963, 0.947, 0.848, and 0.753 across datasets with 1, 2, 3, and 4 tuples, confirming the effectiveness of the model. Furthermore, an F1 score of 0.854 was achieved on a randomly curated dataset. These results highlight the model's capacity to deliver precise and structured information, offering a robust alternative to large language models and equipping researchers with essential data for fostering data-driven innovations.
Problem

Research questions and friction points this paper is trying to address.

Extract structured information from scientific literature
Address multi-tuple extraction challenges in alloys
Improve data-driven material design through advanced NLP
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines MatSciBERT with pointer networks
Utilizes inter- and intra-entity attention
Achieves high F1 scores for multi-tuple extraction
πŸ”Ž Similar Papers
No similar papers found.
M
Mengzhe Hei
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China
Z
Zhouran Zhang
Department of Materials Science and Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China
Q
Qingbao Liu
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, 410072, Hunan, China
Yan Pan
Yan Pan
School of Data and Computer Science, Sun Yat-Sen University
machine learninginformation retrieval
X
Xiang Zhao
Laboratory for Big Data and Decision, National University of Defense Technology, Changsha, 410072, Hunan, China
Y
Yongqian Peng
Department of Materials Science and Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China
Y
Yicong Ye
Department of Materials Science and Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China
X
Xin Zhang
National Key Laboratory of Information Systems Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China
S
Shuxin Bai
Department of Materials Science and Engineering, National University of Defense Technology, Changsha, 410072, Hunan, China