Measuring Dataset Diversity from a Geometric Perspective

๐Ÿ“… 2026-02-10
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
Existing diversity metrics for datasets predominantly rely on statistical distributions or entropy, often overlooking the intrinsic geometric structure of the data. This work introduces persistence landscapes (PLs)โ€”a tool from topological data analysisโ€”into diversity assessment, offering a geometric perspective to quantify structural diversity and establishing a direct link between geometric features and diversity. The proposed PLDiv metric is grounded in rigorous theoretical foundations and exhibits strong interpretability. Empirical evaluations across multimodal settings demonstrate its robustness and reliability, positioning it as a novel paradigm for dataset construction, augmentation, and evaluation.

Technology Category

Application Category

๐Ÿ“ Abstract
Diversity can be broadly defined as the presence of meaningful variation across elements, which can be viewed from multiple perspectives, including statistical variation and geometric structural richness in the dataset. Existing diversity metrics, such as feature-space dispersion and metric-space magnitude, primarily capture distributional variation or entropy, while largely neglecting the geometric structure of datasets. To address this gap, we introduce a framework based on topological data analysis (TDA) and persistence landscapes (PLs) to extract and quantify geometric features from data. This approach provides a theoretically grounded means of measuring diversity beyond entropy, capturing the rich geometric and structural properties of datasets. Through extensive experiments across diverse modalities, we demonstrate that our proposed PLs-based diversity metric (PLDiv) is powerful, reliable, and interpretable, directly linking data diversity to its underlying geometry and offering a foundational tool for dataset construction, augmentation, and evaluation.
Problem

Research questions and friction points this paper is trying to address.

dataset diversity
geometric structure
topological data analysis
persistence landscapes
diversity metrics
Innovation

Methods, ideas, or system contributions that make the work stand out.

topological data analysis
persistence landscapes
dataset diversity
geometric structure
diversity metric
๐Ÿ”Ž Similar Papers
No similar papers found.
Y
Yang Ba
School of Computing and Augmented Intelligence, Arizona State University
M
Mohammad Sadeq Abolhasani
School of Computing and Augmented Intelligence, Arizona State University
M
Michelle V Mancenido
School of Mathematical and Natural Sciences
Rong Pan
Rong Pan
Arizona State University
data sciencestatistical modelingmachine learningquality and reliability engineering