Advancing Fact Attribution for Query Answering: Aggregate Queries and Novel Algorithms

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of quantifying input tuple attribution to query results in SQL queries—particularly those involving aggregate operations such as SUM and COUNT. It introduces the first formal fact-attribution framework for aggregate queries, grounded in cooperative game theory and leveraging exact Banzhaf and Shapley values for contribution computation. Key methodological contributions include: (1) equivalence-based tuple pruning, which exploits tuple equivalence classes to drastically reduce computational redundancy; and (2) a batched attribution algorithm guided by query lineage gradients, enabling efficient parallel execution. Experimental evaluation on million-scale datasets across three database systems demonstrates that the approach achieves up to 1000× speedup over state-of-the-art methods for non-aggregate queries, while—crucially—enabling the first practical, scalable attribution for aggregate queries, thereby filling a longstanding gap in the field.

Technology Category

Application Category

📝 Abstract
In this paper, we introduce a novel approach to computing the contribution of input tuples to the result of the query, quantified by the Banzhaf and Shapley values. In contrast to prior algorithmic work that focuses on Select-Project-Join-Union queries, ours is the first practical approach for queries with aggregates. It relies on two novel optimizations that are essential for its practicality and significantly improve the runtime performance already for queries without aggregates. The first optimization exploits the observation that many input tuples have the same contribution to the query result, so it is enough to compute the contribution of one of them. The second optimization uses the gradient of the query lineage to compute the contributions of all tuples with the same complexity as for one of them. Experiments with a million instances over 3 databases show that our approach achieves up to 3 orders of magnitude runtime improvements over the state-of-the-art for queries without aggregates, and that it is practical for aggregate queries.
Problem

Research questions and friction points this paper is trying to address.

Computing input tuples' contribution to query results
First practical approach for aggregate queries
Novel optimizations for runtime performance improvement
Innovation

Methods, ideas, or system contributions that make the work stand out.

Novel approach for computing tuple contributions
Optimization by grouping tuples with same contributions
Gradient-based computation for efficient contribution analysis
🔎 Similar Papers
No similar papers found.
O
Omer Abramovich
Tel Aviv University, Israel
Daniel Deutch
Daniel Deutch
Tel Aviv University
DatabasesWeb Data Management
N
Nave Frost
eBay Research, Israel
A
Ahmet Kara
OTH Regensburg, Germany
Dan Olteanu
Dan Olteanu
Professor of Computer Science, University of Zurich
databasesdatabase systemsdatabase theorydata management