🤖 AI Summary
This study addresses the insufficient accuracy of person name recognition in Portuguese texts by proposing a local grammar assembly strategy based on co-occurrence comparison. By systematically analyzing inclusion, intersection, and mutual exclusion relationships between co-occurrence indices derived from two sets of local grammars, the method effectively selects and integrates complementary rules to enhance named entity recognition performance. Evaluated on the HAREM II Gold Collection dataset, the approach achieves an F1 score of 76.86, representing a six-percentage-point improvement over the current state-of-the-art results for Portuguese named entity recognition and significantly advancing rule-based person name identification.
📝 Abstract
Named Entity Recognition for person names is an important but non-trivial task in information extraction. This article uses a tool that compares the concordances obtained from two local grammars (LG) and highlights the differences. We used the results as an aid to select the best of a set of LGs. By analyzing the comparisons, we observed relationships of inclusion, intersection and disjunction within each pair of LGs, which helped us to assemble those that yielded the best results. This approach was used in a case study on extraction of person names from texts written in Portuguese. We applied the enhanced grammar to the Gold Collection of the Second HAREM. The F-Measure obtained was 76.86, representing a gain of 6 points in relation to the state-of-the-art for Portuguese.