Rules still work for Open Information Extraction

📅 2024-03-16
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the bottlenecks in Chinese Open Information Extraction (OIE)—namely, syntactic complexity, overreliance on manually crafted patterns, or opaque end-to-end black-box modeling—this paper proposes APRCOIE, a novel OIE framework. First, it introduces a linguistically grounded extraction pattern formalism tailored to Chinese syntactic characteristics. Second, it designs an efficient tensor-based coarse-graining mechanism to drastically reduce redundant matching overhead. Third, it pioneers a fully automated pattern construction paradigm, eliminating dependence on predefined rules or supervised annotations. To enable robust training and evaluation, we construct the first large-scale, human-annotated Chinese OIE benchmark dataset. Experiments demonstrate that APRCOIE consistently outperforms state-of-the-art methods across Precision, Recall, and F1. Both the code and dataset are publicly released, advancing Chinese OIE toward greater interpretability, computational efficiency, and annotation independence.

Technology Category

Application Category

📝 Abstract
Open information extraction (OIE) aims to extract surface relations and their corresponding arguments from natural language text, irrespective of domain. This paper presents an innovative OIE model, APRCOIE, tailored for Chinese text. Diverging from previous models, our model generates extraction patterns autonomously. The model defines a new pattern form for Chinese OIE and proposes an automated pattern generation methodology. In that way, the model can handle a wide array of complex and diverse Chinese grammatical phenomena. We design a preliminary filter based on tensor computing to conduct the extraction procedure efficiently. To train the model, we manually annotated a large-scale Chinese OIE dataset. In the comparative evaluation, we demonstrate that APRCOIE outperforms state-of-the-art Chinese OIE models and significantly expands the boundaries of achievable OIE performance. The code of APRCOIE and the annotated dataset are released on GitHub (https://github.com/jialin666/APRCOIE_v1)
Problem

Research questions and friction points this paper is trying to address.

Chinese Open Information Extraction
Accuracy Improvement
Complex Grammar Handling
Innovation

Methods, ideas, or system contributions that make the work stand out.

APRCOIE
Chinese Open Information Extraction
Optimized Grammar Processing
🔎 Similar Papers
No similar papers found.
J
Jialin Hua
Key Laboratory of Data Science in Finance and Economics, and School of Statistics and Data Science, Jiangxi University of Finance and Economics, Nanchang, China
L
Liangqing Luo
Key Laboratory of Data Science in Finance and Economics, and School of Statistics and Data Science, Jiangxi University of Finance and Economics, Nanchang, China
W
Weiying Ping
Key Laboratory of Data Science in Finance and Economics, and School of Statistics and Data Science, Jiangxi University of Finance and Economics, Nanchang, China
Yan Liao
Yan Liao
Key Laboratory of Data Science in Finance and Economics, and School of Statistics and Data Science, Jiangxi University of Finance and Economics, Nanchang, China
C
Chunhai Tao
Key Laboratory of Data Science in Finance and Economics, and School of Statistics and Data Science, Jiangxi University of Finance and Economics, Nanchang, China
Xuewen Lu
Xuewen Lu
Department of Mathematics and Statistics, University of Calgary, Calgary, Canada