🤖 AI Summary
To address the lack of systematic decision support for choosing between rule-based and machine learning (ML) approaches in information extraction (IE), this paper introduces REST, a rule-first decision-assistance tool. REST’s core contributions are: (1) a novel visual framework for assessing rule feasibility and predicting rule performance; (2) a hybrid paradigm prioritizing rules as the default choice and ML as an on-demand fallback; and (3) rapid rule evaluation and deployment enabled by a single expert session. REST integrates rule engineering, lightweight ML evaluation, expert knowledge encoding, and multi-dimensional performance modeling—including F1 score, development effort, and maintainability—via an interactive interface. Evaluated across 12 real-world entity extraction tasks, REST demonstrates that rule-based solutions cover 83% of entity types, achieve an average F1 of 0.89, reduce annotation requirements by 67%, and shorten rule development cycles by 52%. The approach significantly enhances sustainability, interpretability, and cross-task transferability.
📝 Abstract
Rules could be an information extraction (IE) default option, compared to ML and LLMs in terms of sustainability, transferability, interpretability, and development burden. We suggest a sustainable and combined use of rules and ML as an IE method. Our approach starts with an exhaustive expert manual highlighting in a single working session of a representative subset of the data corpus. We developed and validated the feasibility and the performance metrics of the REST decision tool to help the annotator choose between rules as a by default option and ML for each entity of an IE task. REST makes the annotator visualize the characteristics of each entity formalization in the free texts and the expected rule development feasibility and IE performance metrics. ML is considered as a backup IE option and manual annotation for training is therefore minimized. The external validity of REST on a 12-entity use case showed good reproducibility.