🤖 AI Summary
To address the challenge of automatically identifying domain-specific technical terms and their hypernyms in construction technical specifications, this paper proposes an end-to-end hypernym relation extraction framework. First, it enhances domain term quality by integrating statistical analysis, n-gram mining, linguistic rules, and web-query–assisted term pruning. Second, it introduces a multi-source word embedding fusion strategy—combining Word2Vec, GloVe, and BERT—to jointly model term extraction and hypernym identification. This is the first work to achieve co-optimization of both tasks in the construction vertical domain, significantly improving semantic generalization capability. Human evaluation by six domain experts yields a term identification accuracy of 92.3% and a hypernym recognition F1-score of 86.7%, outperforming state-of-the-art baseline methods.
📝 Abstract
This article presents a complete process to extract hypernym relationships in the field of construction using two main steps: terminology extraction and detection of hypernyms from these terms. We first describe the corpus analysis method to extract terminology from a collection of technical specifications in the field of construction. Using statistics and word n-grams analysis, we extract the domain's terminology and then perform pruning steps with linguistic patterns and internet queries to improve the quality of the final terminology. Second, we present a machine-learning approach based on various words embedding models and combinations to deal with the detection of hypernyms from the extracted terminology. Extracted terminology is evaluated using a manual evaluation carried out by 6 experts in the domain, and the hypernym identification method is evaluated with different datasets. The global approach provides relevant and promising results.