🤖 AI Summary
Deletion propagation (DP) is a classical database problem concerning the backward inference of view deletions to underlying source data, long hindered by fragmented variants, disjointed complexity analyses, and insufficient algorithmic generality. This paper introduces the first unified framework that formally subsumes all known DP variants—including self-joins, unions, and bag semantics—while supporting real-world applications such as GDPR compliance and query explanation. Methodologically, we design a universally applicable, instance-optimal algorithm grounded in constraint satisfaction and data-driven optimization, requiring no prior assumptions about data structure and strictly adhering to SQL semantics and complex relational operators. Theoretically, we establish a novel complexity characterization; practically, our algorithm achieves optimal polynomial-time performance for tractable cases. Experiments demonstrate speedups of several orders of magnitude over specialized algorithms and provide the first end-to-end executable validation of theoretical results.
📝 Abstract
Deletion Propagation problems are a family of database problems that have been studied for over 40 years. They are variants of the classical view-update problem where intended tuple deletions in the view (output of a query) are propagated back to the source (input database) in a manner that obeys certain constraints while minimizing side effects. Problems from this family have been used in domains as diverse as GDPR compliance, effective SQL pedagogy, and query explanations. However, so far these variants, their complexity, and practical algorithms have always been studied in isolation. In this paper, we unify the Deletion Propagation (DP) in a single generalized framework that comes with several appealing benefits: (1) Our approach not only captures all prior deletion propagation variants but also introduces a whole family of new and well-motivated problems. (2) Our algorithmic solution is general and practical. It solves problems `course-grained instance-optimally', i.e., our algorithm is not only guaranteed to terminate in polynomial time (PTIME) for all currently known PTIME cases, it can also leverage regularities in the data without explicitly receiving them as input (knowing about certain structural properties in data is often a prerequisite for a specialized algorithm to be applicable). (3) At the same time, our approach is not only practical (easy-to-implement), it is also competitive with (and at times faster by orders of magnitude than) prior PTIME approaches specialized for each problem. For variants of the problem that have been studied only theoretically so far, we show the first experimental results. (4) Our approach is complete. It can solve all problem variants and covers all settings (even those that have been previously notoriously difficult to study, such as queries with self-joins, unions, and bag semantics), and it also allows us to provide new complexity results.