Towards Data Valuation via Asymmetric Data Shapley

📅 2024-11-01
🏛️ arXiv.org
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Traditional Shapley value-based data valuation suffers from inaccuracy when applied to real-world datasets exhibiting heterogeneity and complex dependency structures, as it implicitly assumes data homogeneity and independence. Method: This paper proposes a structure-aware asymmetric Data Shapley framework. It introduces the asymmetry axiom—the first formal incorporation of asymmetry into data value quantification—thereby relaxing classical Shapley assumptions. Leveraging k-nearest neighbor graphs to capture intrinsic data structure, we design the first exact algorithm that simultaneously provides theoretical guarantees (e.g., fairness, efficiency, structure-awareness) and computational tractability. Contribution/Results: Extensive evaluation across diverse supervised learning tasks and data market scenarios demonstrates substantial improvements in contribution assessment accuracy and structural sensitivity. The open-source implementation has been widely adopted, establishing a new paradigm for data pricing and interpretable machine learning.

Technology Category

Application Category

📝 Abstract
As data emerges as a vital driver of technological and economic advancements, a key challenge is accurately quantifying its value in algorithmic decision-making. The Shapley value, a well-established concept from cooperative game theory, has been widely adopted to assess the contribution of individual data sources in supervised machine learning. However, its symmetry axiom assumes all players in the cooperative game are homogeneous, which overlooks the complex structures and dependencies present in real-world datasets. To address this limitation, we extend the traditional data Shapley framework to asymmetric data Shapley, making it flexible enough to incorporate inherent structures within the datasets for structure-aware data valuation. We also introduce an efficient $k$-nearest neighbor-based algorithm for its exact computation. We demonstrate the practical applicability of our framework across various machine learning tasks and data market contexts. The code is available at: https://github.com/xzheng01/Asymmetric-Data-Shapley.
Problem

Research questions and friction points this paper is trying to address.

Quantifying data value in algorithmic decision-making processes
Overcoming Shapley value's symmetry limitation for heterogeneous datasets
Developing structure-aware data valuation with efficient computation methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Extends traditional Shapley to asymmetric data valuation
Incorporates dataset structures for structure-aware valuation
Introduces k-nearest neighbor algorithm for exact computation
🔎 Similar Papers
No similar papers found.