🤖 AI Summary
Existing ASPEN systems support only global entity resolution, making them ill-suited for value-level heterogeneity—e.g., resolving “J. Lee” context-dependently to either “Joy Lee” or “Jake Lee”—and lack optimization criteria targeting parsing quality. This paper proposes ASPEN+, which addresses these limitations via three key innovations: (1) a local merging mechanism enabling context-sensitive clustering of identical entity names; (2) a novel optimization objective balancing conflict minimization and rule support maximization, formalized under multiple semantics of optimality; and (3) an extension of the original system using Answer Set Programming to enable fine-grained logical reasoning and efficient solving. Experiments on real-world datasets demonstrate that ASPEN+ significantly improves parsing accuracy while maintaining acceptable runtime efficiency.
📝 Abstract
In this paper, we present ASPEN+, which extends an existing ASP-based system, ASPEN,for collective entity resolution with two important functionalities: support for local merges and new optimality criteria for preferred solutions. Indeed, ASPEN only supports so-called global merges of entity-referring constants (e.g. author ids), in which all occurrences of matched constants are treated as equivalent and merged accordingly. However, it has been argued that when resolving data values, local merges are often more appropriate, as e.g. some instances of 'J. Lee' may refer to 'Joy Lee', while others should be matched with 'Jake Lee'. In addition to allowing such local merges, ASPEN+ offers new optimality criteria for selecting solutions, such as minimizing rule violations or maximising the number of rules supporting a merge. Our main contributions are thus (1) the formalisation and computational analysis of various notions of optimal solution, and (2) an extensive experimental evaluation on real-world datasets, demonstrating the effect of local merges and the new optimality criteria on both accuracy and runtime.