🤖 AI Summary
This paper addresses the challenge of retrieving long-tail quantitative facts—such as numerical values and their contextual evidence—from unstructured documents, proposing the novel task of “quantity retrieval.” Methodologically, it introduces an end-to-end framework grounded in quantity description parsing: (1) explicitly modeling the semantic structure of quantity phrases; (2) automatically generating large-scale back-translated training data via weak supervision leveraging quantity co-occurrence patterns; and (3) designing a semantic matching model for joint localization of values and supporting evidence. Evaluated on financial annual reports and a newly constructed annotated dataset, the approach achieves a top-1 accuracy of 64.66%, substantially outperforming the baseline (30.98%). Key contributions include: (i) formal definition of the quantity retrieval task; (ii) an interpretable, structure-aware quantity parsing paradigm; and (iii) the first weakly supervised pipeline for constructing training data specifically tailored to quantitative fact retrieval.
📝 Abstract
Quantitative facts are continually generated by companies and governments, supporting data-driven decision-making. While common facts are structured, many long-tail quantitative facts remain buried in unstructured documents, making them difficult to access. We propose the task of Quantity Retrieval: given a description of a quantitative fact, the system returns the relevant value and supporting evidence. Understanding quantity semantics in context is essential for this task. We introduce a framework based on description parsing that converts text into structured (description, quantity) pairs for effective retrieval. To improve learning, we construct a large paraphrase dataset using weak supervision based on quantity co-occurrence. We evaluate our approach on a large corpus of financial annual reports and a newly annotated quantity description dataset. Our method significantly improves top-1 retrieval accuracy from 30.98 percent to 64.66 percent.